Overview

Dataset statistics

Number of variables45
Number of observations57588
Missing cells44103
Missing cells (%)1.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory118.7 MiB
Average record size in memory2.1 KiB

Variable types

CAT30
NUM13
BOOL2

Reproduction

Analysis started2020-07-11 23:48:09.973832
Analysis finished2020-07-11 23:49:20.284806
Duration1 minute and 10.31 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

recorded_by has constant value "GeoData Consultants Ltd" Constant
date_recorded has a high cardinality: 353 distinct values High cardinality
funder has a high cardinality: 1858 distinct values High cardinality
installer has a high cardinality: 2113 distinct values High cardinality
wpt_name has a high cardinality: 36720 distinct values High cardinality
subvillage has a high cardinality: 18567 distinct values High cardinality
lga has a high cardinality: 124 distinct values High cardinality
ward has a high cardinality: 2033 distinct values High cardinality
scheme_name has a high cardinality: 2658 distinct values High cardinality
geometry has a high cardinality: 57519 distinct values High cardinality
x is highly correlated with longitudeHigh correlation
longitude is highly correlated with xHigh correlation
y is highly correlated with latitudeHigh correlation
latitude is highly correlated with yHigh correlation
extraction_type_group is highly correlated with extraction_type and 1 other fieldsHigh correlation
extraction_type is highly correlated with extraction_type_group and 1 other fieldsHigh correlation
extraction_type_class is highly correlated with extraction_type and 1 other fieldsHigh correlation
management_group is highly correlated with managementHigh correlation
management is highly correlated with management_groupHigh correlation
payment_type is highly correlated with paymentHigh correlation
payment is highly correlated with payment_typeHigh correlation
quality_group is highly correlated with water_qualityHigh correlation
water_quality is highly correlated with quality_groupHigh correlation
quantity_group is highly correlated with quantityHigh correlation
quantity is highly correlated with quantity_groupHigh correlation
source_type is highly correlated with source and 1 other fieldsHigh correlation
source is highly correlated with source_type and 1 other fieldsHigh correlation
source_class is highly correlated with source and 1 other fieldsHigh correlation
waterpoint_type_group is highly correlated with waterpoint_typeHigh correlation
waterpoint_type is highly correlated with waterpoint_type_groupHigh correlation
funder has 3622 (6.3%) missing values Missing
installer has 3636 (6.3%) missing values Missing
public_meeting has 2976 (5.2%) missing values Missing
scheme_management has 3750 (6.5%) missing values Missing
scheme_name has 26692 (46.3%) missing values Missing
permit has 3056 (5.3%) missing values Missing
amount_tsh is highly skewed (γ1 = 56.93966707) Skewed
num_private is highly skewed (γ1 = 90.52355548) Skewed
geometry is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
id has unique values Unique
amount_tsh has 39827 (69.2%) zeros Zeros
gps_height has 18626 (32.3%) zeros Zeros
num_private has 56831 (98.7%) zeros Zeros
population has 19569 (34.0%) zeros Zeros
construction_year has 18897 (32.8%) zeros Zeros

Variables

Unnamed: 0
Real number (ℝ≥0)

UNIQUE

Distinct count57588
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29690.43078419115
Minimum0
Maximum59399
Zeros1
Zeros (%)< 0.1%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile2980.35
Q114825.75
median29688.5
Q344542.25
95-th percentile56421.65
Maximum59399
Range59399
Interquartile range (IQR)29716.5

Descriptive statistics

Standard deviation17147.06679
Coefficient of variation (CV)0.5775283933
Kurtosis-1.200257886
Mean29690.43078
Median Absolute Deviation (MAD)14858.5
Skewness0.0006560063793
Sum1709812528
Variance294021899.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
516601< 0.1%
 
579931< 0.1%
 
375111< 0.1%
 
395581< 0.1%
 
334131< 0.1%
 
354601< 0.1%
 
456991< 0.1%
 
416011< 0.1%
 
436481< 0.1%
 
210871< 0.1%
 
231341< 0.1%
 
169891< 0.1%
 
190361< 0.1%
 
292751< 0.1%
 
313221< 0.1%
 
251771< 0.1%
 
46951< 0.1%
 
67421< 0.1%
 
5971< 0.1%
 
26441< 0.1%
 
128831< 0.1%
 
149301< 0.1%
 
87851< 0.1%
 
108321< 0.1%
 
Other values (57563)57563> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
593991< 0.1%
 
593981< 0.1%
 
593971< 0.1%
 
593961< 0.1%
 
593951< 0.1%
 
593941< 0.1%
 
593931< 0.1%
 
593921< 0.1%
 
593911< 0.1%
 
593901< 0.1%
 

id
Real number (ℝ≥0)

UNIQUE

Distinct count57588
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37106.48807043134
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile3726.7
Q118522.75
median37054.5
Q355667.25
95-th percentile70541.65
Maximum74247
Range74247
Interquartile range (IQR)37144.5

Descriptive statistics

Standard deviation21454.51421
Coefficient of variation (CV)0.5781876789
Kurtosis-1.201821343
Mean37106.48807
Median Absolute Deviation (MAD)18569.5
Skewness0.002243961664
Sum2136888435
Variance460296180
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
209591< 0.1%
 
47591< 0.1%
 
6611< 0.1%
 
27081< 0.1%
 
129471< 0.1%
 
149941< 0.1%
 
88491< 0.1%
 
108961< 0.1%
 
539031< 0.1%
 
559501< 0.1%
 
498051< 0.1%
 
518521< 0.1%
 
620911< 0.1%
 
641381< 0.1%
 
579931< 0.1%
 
600401< 0.1%
 
334131< 0.1%
 
354601< 0.1%
 
456991< 0.1%
 
416011< 0.1%
 
436481< 0.1%
 
702631< 0.1%
 
723101< 0.1%
 
682121< 0.1%
 
Other values (57563)57563> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
61< 0.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
ValueCountFrequency (%) 
742471< 0.1%
 
742461< 0.1%
 
742431< 0.1%
 
742421< 0.1%
 
742401< 0.1%
 
742391< 0.1%
 
742381< 0.1%
 
742371< 0.1%
 
742361< 0.1%
 
742351< 0.1%
 

amount_tsh
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count98
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean327.64521862193516
Minimum0.0
Maximum350000.0
Zeros39827
Zeros (%)69.2%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q330
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)30

Descriptive statistics

Standard deviation3043.831403
Coefficient of variation (CV)9.290022347
Kurtosis4756.496721
Mean327.6452186
Median Absolute Deviation (MAD)0
Skewness56.93966707
Sum18868432.85
Variance9264909.609
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03982769.2%
 
50031025.4%
 
5024724.3%
 
100014882.6%
 
2014632.5%
 
20012202.1%
 
1008161.4%
 
108061.4%
 
307431.3%
 
20007041.2%
 
2505691.0%
 
3005571.0%
 
50004500.8%
 
53760.7%
 
253560.6%
 
30003340.6%
 
12002670.5%
 
15001970.3%
 
61900.3%
 
6001760.3%
 
40001560.3%
 
24001450.3%
 
25001390.2%
 
60001250.2%
 
7690.1%
 
Other values (73)8411.5%
 
ValueCountFrequency (%) 
03982769.2%
 
0.23< 0.1%
 
0.251< 0.1%
 
13< 0.1%
 
213< 0.1%
 
53760.7%
 
61900.3%
 
7690.1%
 
91< 0.1%
 
108061.4%
 
ValueCountFrequency (%) 
3500001< 0.1%
 
2500001< 0.1%
 
2000001< 0.1%
 
1700001< 0.1%
 
1380001< 0.1%
 
1200001< 0.1%
 
1170007< 0.1%
 
1000003< 0.1%
 
700001< 0.1%
 
600001< 0.1%
 

date_recorded
Categorical

HIGH CARDINALITY

Distinct count353
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
2011-03-15
 
572
2011-03-17
 
558
2013-02-03
 
545
2011-03-14
 
520
2011-03-16
 
513
Other values (348)
54880
ValueCountFrequency (%) 
2011-03-155721.0%
 
2011-03-175581.0%
 
2013-02-035450.9%
 
2011-03-145200.9%
 
2011-03-165130.9%
 
2011-03-184970.9%
 
2011-03-194660.8%
 
2011-03-044580.8%
 
2011-03-054340.8%
 
2013-01-244330.8%
 
2013-03-154280.7%
 
2013-02-144270.7%
 
2011-03-114260.7%
 
2013-01-294180.7%
 
2011-03-234170.7%
 
2011-03-094160.7%
 
2013-02-044110.7%
 
2013-02-153990.7%
 
2011-03-303910.7%
 
2013-02-263910.7%
 
2011-03-243810.7%
 
2013-02-163810.7%
 
2013-03-193810.7%
 
2013-02-133800.7%
 
2013-01-303800.7%
 
Other values (328)4656580.9%
 

Length

Max length10
Median length10
Mean length10
Min length10

Overview of Unicode Properties

Unique unicode characters11
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
013494723.4%
 
112500321.7%
 
-11517620.0%
 
210019317.4%
 
3518229.0%
 
7124442.2%
 
4104911.8%
 
889201.5%
 
659431.0%
 
558381.0%
 
951030.9%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number46070480.0%
 
Dash Punctuation11517620.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
013494729.3%
 
112500327.1%
 
210019321.7%
 
35182211.2%
 
7124442.7%
 
4104912.3%
 
889201.9%
 
659431.3%
 
558381.3%
 
951031.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-115176100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common575880100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
013494723.4%
 
112500321.7%
 
-11517620.0%
 
210019317.4%
 
3518229.0%
 
7124442.2%
 
4104911.8%
 
889201.5%
 
659431.0%
 
558381.0%
 
951030.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII575880100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
013494723.4%
 
112500321.7%
 
-11517620.0%
 
210019317.4%
 
3518229.0%
 
7124442.2%
 
4104911.8%
 
889201.5%
 
659431.0%
 
558381.0%
 
951030.9%
 

funder
Categorical

HIGH CARDINALITY
MISSING

Distinct count1858
Unique (%)3.4%
Missing3622
Missing (%)6.3%
Memory size450.0 KiB
Government Of Tanzania
8842
Danida
 
3114
Hesawa
 
1914
World Bank
 
1345
Kkkt
 
1287
Other values (1853)
37464
ValueCountFrequency (%) 
Government Of Tanzania884215.4%
 
Danida31145.4%
 
Hesawa19143.3%
 
World Bank13452.3%
 
Kkkt12872.2%
 
World Vision12242.1%
 
Rwssp11872.1%
 
Unicef10351.8%
 
District Council8431.5%
 
Tasaf8341.4%
 
Dhv8291.4%
 
Private Individual8241.4%
 
07771.3%
 
Norad7651.3%
 
Germany Republi6101.1%
 
Tcrs6021.0%
 
Ministry Of Water5901.0%
 
Water5831.0%
 
Dwe4840.8%
 
Netherlands4610.8%
 
Hifab4500.8%
 
Adb4480.8%
 
Lga4420.8%
 
Amref4250.7%
 
Fini Water3930.7%
 
Other values (1833)2365841.1%
 
(Missing)36226.3%
 

Length

Max length30
Median length6
Mean length9.563693825
Min length1

Overview of Unicode Properties

Unique unicode characters69
Unique unicode categories (?)9
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a7009512.7%
 
n6377611.6%
 
i374426.8%
 
e365316.6%
 
340536.2%
 
r275245.0%
 
t225504.1%
 
o223174.1%
 
s158432.9%
 
d152672.8%
 
f150172.7%
 
m148352.7%
 
v126942.3%
 
T118062.1%
 
l109922.0%
 
G104621.9%
 
O103621.9%
 
z94371.7%
 
c91191.7%
 
u78601.4%
 
D74271.3%
 
W71931.3%
 
w69361.3%
 
k64891.2%
 
p61961.1%
 
Other values (44)5853110.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter42593577.3%
 
Uppercase Letter8729115.8%
 
Space Separator340536.2%
 
Other Punctuation13170.2%
 
Decimal Number8010.1%
 
Open Punctuation4360.1%
 
Close Punctuation4310.1%
 
Dash Punctuation3230.1%
 
Connector Punctuation167< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
T1180613.5%
 
G1046212.0%
 
O1036211.9%
 
D74278.5%
 
W71938.2%
 
C46625.3%
 
R42364.9%
 
H31393.6%
 
M31313.6%
 
K29483.4%
 
A28913.3%
 
S26313.0%
 
I24202.8%
 
B20502.3%
 
N20102.3%
 
P19222.2%
 
U18552.1%
 
V17722.0%
 
L14041.6%
 
F13791.6%
 
J7950.9%
 
E4350.5%
 
Y2330.3%
 
Q1110.1%
 
Z16< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a7009516.5%
 
n6377615.0%
 
i374428.8%
 
e365318.6%
 
r275246.5%
 
t225505.3%
 
o223175.2%
 
s158433.7%
 
d152673.6%
 
f150173.5%
 
m148353.5%
 
v126943.0%
 
l109922.6%
 
z94372.2%
 
c91192.1%
 
u78601.8%
 
w69361.6%
 
k64891.5%
 
p61961.5%
 
h56771.3%
 
g30350.7%
 
b27270.6%
 
y26700.6%
 
x5650.1%
 
j3100.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
34053100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(43499.5%
 
[20.5%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)42999.5%
 
]20.5%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/78359.5%
 
.46935.6%
 
\332.5%
 
&211.6%
 
'110.8%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_167100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
079198.8%
 
250.6%
 
120.2%
 
920.2%
 
410.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-323100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin51322693.2%
 
Common375286.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a7009513.7%
 
n6377612.4%
 
i374427.3%
 
e365317.1%
 
r275245.4%
 
t225504.4%
 
o223174.3%
 
s158433.1%
 
d152673.0%
 
f150172.9%
 
m148352.9%
 
v126942.5%
 
T118062.3%
 
l109922.1%
 
G104622.0%
 
O103622.0%
 
z94371.8%
 
c91191.8%
 
u78601.5%
 
D74271.4%
 
W71931.4%
 
w69361.4%
 
k64891.3%
 
p61961.2%
 
h56771.1%
 
Other values (27)493799.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
3405390.7%
 
07912.1%
 
/7832.1%
 
.4691.2%
 
(4341.2%
 
)4291.1%
 
-3230.9%
 
_1670.4%
 
\330.1%
 
&210.1%
 
'11< 0.1%
 
25< 0.1%
 
12< 0.1%
 
[2< 0.1%
 
]2< 0.1%
 
92< 0.1%
 
41< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII550754100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a7009512.7%
 
n6377611.6%
 
i374426.8%
 
e365316.6%
 
340536.2%
 
r275245.0%
 
t225504.1%
 
o223174.1%
 
s158432.9%
 
d152672.8%
 
f150172.7%
 
m148352.7%
 
v126942.3%
 
T118062.1%
 
l109922.0%
 
G104621.9%
 
O103621.9%
 
z94371.7%
 
c91191.7%
 
u78601.4%
 
D74271.3%
 
W71931.3%
 
w69361.3%
 
k64891.2%
 
p61961.1%
 
Other values (44)5853110.6%
 

gps_height
Real number (ℝ)

ZEROS

Distinct count2428
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean689.3251371813573
Minimum-90
Maximum2770
Zeros18626
Zeros (%)32.3%
Memory size450.0 KiB

Quantile statistics

Minimum-90
5-th percentile0
Q10
median426
Q31332
95-th percentile1803
Maximum2770
Range2860
Interquartile range (IQR)1332

Descriptive statistics

Standard deviation693.564188
Coefficient of variation (CV)1.006149567
Kurtosis-1.326008097
Mean689.3251372
Median Absolute Deviation (MAD)426
Skewness0.4131933762
Sum39696856
Variance481031.2829
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01862632.3%
 
-15600.1%
 
-16550.1%
 
-13550.1%
 
-20520.1%
 
1290520.1%
 
-14510.1%
 
303510.1%
 
-18490.1%
 
-19470.1%
 
1269460.1%
 
1295460.1%
 
1304450.1%
 
-23450.1%
 
280440.1%
 
1538440.1%
 
1286440.1%
 
-8440.1%
 
-17440.1%
 
1332430.1%
 
320430.1%
 
1317420.1%
 
1293420.1%
 
1319420.1%
 
1359420.1%
 
Other values (2403)3783465.7%
 
ValueCountFrequency (%) 
-901< 0.1%
 
-632< 0.1%
 
-591< 0.1%
 
-571< 0.1%
 
-551< 0.1%
 
-541< 0.1%
 
-531< 0.1%
 
-522< 0.1%
 
-512< 0.1%
 
-505< 0.1%
 
ValueCountFrequency (%) 
27701< 0.1%
 
26281< 0.1%
 
26271< 0.1%
 
26262< 0.1%
 
26231< 0.1%
 
26141< 0.1%
 
25851< 0.1%
 
25761< 0.1%
 
25691< 0.1%
 
25681< 0.1%
 

installer
Categorical

HIGH CARDINALITY
MISSING

Distinct count2113
Unique (%)3.9%
Missing3636
Missing (%)6.3%
Memory size450.0 KiB
DWE
16255
Government
 
1670
RWE
 
1181
Commu
 
1060
DANIDA
 
1050
Other values (2108)
32736
ValueCountFrequency (%) 
DWE1625528.2%
 
Government16702.9%
 
RWE11812.1%
 
Commu10601.8%
 
DANIDA10501.8%
 
KKKT8971.6%
 
Hesawa8031.4%
 
07771.3%
 
TCRS7071.2%
 
Central government6191.1%
 
CES6101.1%
 
DANID5521.0%
 
District Council5511.0%
 
Community5390.9%
 
HESAWA5370.9%
 
World vision4080.7%
 
LGA4080.7%
 
WEDECO3970.7%
 
District council3920.7%
 
Gover3830.7%
 
TASAF3770.7%
 
AMREF3290.6%
 
TWESA3160.5%
 
WU3010.5%
 
Dmdd2870.5%
 
Other values (2088)2254639.2%
 
(Missing)36366.3%
 

Length

Max length30
Median length4
Mean length5.962926304
Min length1

Overview of Unicode Properties

Unique unicode characters69
Unique unicode categories (?)10
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
D263667.7%
 
W244987.1%
 
E242007.0%
 
n232646.8%
 
a207216.0%
 
e150834.4%
 
i149234.3%
 
A134873.9%
 
r131393.8%
 
t126223.7%
 
125723.7%
 
o121113.5%
 
C104523.0%
 
m90902.6%
 
S66241.9%
 
R64751.9%
 
l61191.8%
 
s61051.8%
 
I59821.7%
 
T58231.7%
 
u54161.6%
 
K53751.6%
 
c48151.4%
 
N46321.3%
 
G42901.2%
 
Other values (44)4920914.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter16623448.4%
 
Uppercase Letter16222847.2%
 
Space Separator125723.7%
 
Other Punctuation9640.3%
 
Decimal Number7810.2%
 
Dash Punctuation2680.1%
 
Connector Punctuation169< 0.1%
 
Open Punctuation159< 0.1%
 
Close Punctuation16< 0.1%
 
Currency Symbol2< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
D2636616.3%
 
W2449815.1%
 
E2420014.9%
 
A134878.3%
 
C104526.4%
 
S66244.1%
 
R64754.0%
 
I59823.7%
 
T58233.6%
 
K53753.3%
 
N46322.9%
 
G42902.6%
 
M42262.6%
 
H33792.1%
 
F30891.9%
 
O30881.9%
 
L23511.4%
 
U22261.4%
 
P18831.2%
 
V14760.9%
 
B7940.5%
 
J7250.4%
 
X3560.2%
 
Y2450.2%
 
Z1280.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n2326414.0%
 
a2072112.5%
 
e150839.1%
 
i149239.0%
 
r131397.9%
 
t126227.6%
 
o121117.3%
 
m90905.5%
 
l61193.7%
 
s61053.7%
 
u54163.3%
 
c48152.9%
 
v42742.6%
 
d41782.5%
 
w32932.0%
 
g26701.6%
 
y17691.1%
 
h16961.0%
 
p14190.9%
 
k13920.8%
 
f8020.5%
 
b5030.3%
 
j4820.3%
 
z3200.2%
 
x14< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
12572100.0%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_169100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/66969.4%
 
.23624.5%
 
&485.0%
 
'111.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
077899.6%
 
110.1%
 
410.1%
 
910.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-268100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(15798.7%
 
[21.3%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
}1381.2%
 
]212.5%
 
)16.2%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$2100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin32846295.7%
 
Common149314.3%
 

Most frequent Latin characters

ValueCountFrequency (%) 
D263668.0%
 
W244987.5%
 
E242007.4%
 
n232647.1%
 
a207216.3%
 
e150834.6%
 
i149234.5%
 
A134874.1%
 
r131394.0%
 
t126223.8%
 
o121113.7%
 
C104523.2%
 
m90902.8%
 
S66242.0%
 
R64752.0%
 
l61191.9%
 
s61051.9%
 
I59821.8%
 
T58231.8%
 
u54161.6%
 
K53751.6%
 
c48151.5%
 
N46321.4%
 
G42901.3%
 
v42741.3%
 
Other values (27)4257613.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1257284.2%
 
07785.2%
 
/6694.5%
 
-2681.8%
 
.2361.6%
 
_1691.1%
 
(1571.1%
 
&480.3%
 
}130.1%
 
'110.1%
 
$2< 0.1%
 
[2< 0.1%
 
]2< 0.1%
 
)1< 0.1%
 
11< 0.1%
 
41< 0.1%
 
91< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII343393100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
D263667.7%
 
W244987.1%
 
E242007.0%
 
n232646.8%
 
a207216.0%
 
e150834.4%
 
i149234.3%
 
A134873.9%
 
r131393.8%
 
t126223.7%
 
125723.7%
 
o121113.5%
 
C104523.0%
 
m90902.6%
 
S66241.9%
 
R64751.9%
 
l61191.8%
 
s61051.8%
 
I59821.7%
 
T58231.7%
 
u54161.6%
 
K53751.6%
 
c48151.4%
 
N46321.3%
 
G42901.2%
 
Other values (44)4920914.3%
 

longitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count57515
Unique (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.149669123888835
Minimum29.6071219
Maximum40.34519307
Zeros0
Zeros (%)0.0%
Memory size450.0 KiB

Quantile statistics

Minimum29.6071219
5-th percentile30.62360773
Q133.28510016
median35.00594322
Q337.23371212
95-th percentile39.15049865
Maximum40.34519307
Range10.73807117
Interquartile range (IQR)3.94861196

Descriptive statistics

Standard deviation2.60742797
Coefficient of variation (CV)0.07418072587
Kurtosis-0.8692761515
Mean35.14966912
Median Absolute Deviation (MAD)1.979294605
Skewness-0.1348112926
Sum2024199.146
Variance6.798680617
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
33.090347382< 0.1%
 
39.086286572< 0.1%
 
39.093095442< 0.1%
 
39.098513622< 0.1%
 
37.543401452< 0.1%
 
32.988560042< 0.1%
 
32.956522792< 0.1%
 
32.987670482< 0.1%
 
32.967009262< 0.1%
 
32.993276842< 0.1%
 
39.085964962< 0.1%
 
37.534327342< 0.1%
 
31.619529532< 0.1%
 
39.095684162< 0.1%
 
39.086182572< 0.1%
 
37.252194462< 0.1%
 
32.965734452< 0.1%
 
37.375716872< 0.1%
 
37.318911282< 0.1%
 
37.374016552< 0.1%
 
32.982698062< 0.1%
 
37.540900642< 0.1%
 
39.088875132< 0.1%
 
38.340501342< 0.1%
 
39.119210372< 0.1%
 
Other values (57490)5753899.9%
 
ValueCountFrequency (%) 
29.60712191< 0.1%
 
29.607201091< 0.1%
 
29.610320561< 0.1%
 
29.610964821< 0.1%
 
29.611946741< 0.1%
 
29.612506891< 0.1%
 
29.612762961< 0.1%
 
29.613443091< 0.1%
 
29.61687181< 0.1%
 
29.618479191< 0.1%
 
ValueCountFrequency (%) 
40.345193071< 0.1%
 
40.344300891< 0.1%
 
40.325239961< 0.1%
 
40.325226431< 0.1%
 
40.323401811< 0.1%
 
40.322832371< 0.1%
 
40.322804531< 0.1%
 
40.32262511< 0.1%
 
40.322169021< 0.1%
 
40.321965931< 0.1%
 

latitude
Real number (ℝ)

HIGH CORRELATION

Distinct count57516
Unique (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.885572340514864
Minimum-11.64944018
Maximum-0.99846435
Zeros0
Zeros (%)0.0%
Memory size450.0 KiB

Quantile statistics

Minimum-11.64944018
5-th percentile-10.60147827
Q1-8.643840785
median-5.17270373
Q3-3.372824195
95-th percentile-1.802689797
Maximum-0.99846435
Range10.65097583
Interquartile range (IQR)5.27101659

Descriptive statistics

Standard deviation2.809876457
Coefficient of variation (CV)-0.477417708
Kurtosis-1.203165882
Mean-5.885572341
Median Absolute Deviation (MAD)2.041399535
Skewness-0.2522877584
Sum-338938.3399
Variance7.895405705
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-6.976270112< 0.1%
 
-6.985841732< 0.1%
 
-7.056922532< 0.1%
 
-6.97875552< 0.1%
 
-6.959748732< 0.1%
 
-6.963556652< 0.1%
 
-2.463909842< 0.1%
 
-7.103742322< 0.1%
 
-6.983182632< 0.1%
 
-2.519950412< 0.1%
 
-2.528715732< 0.1%
 
-6.98022042< 0.1%
 
-6.989456222< 0.1%
 
-2.506589542< 0.1%
 
-6.956745642< 0.1%
 
-7.104625032< 0.1%
 
-2.516619392< 0.1%
 
-2.494545592< 0.1%
 
-6.962475162< 0.1%
 
-6.983115122< 0.1%
 
-2.496458682< 0.1%
 
-9.28934922< 0.1%
 
-2.515320722< 0.1%
 
-6.990548642< 0.1%
 
-6.96425762< 0.1%
 
Other values (57491)5753899.9%
 
ValueCountFrequency (%) 
-11.649440181< 0.1%
 
-11.648377591< 0.1%
 
-11.586296561< 0.1%
 
-11.568576791< 0.1%
 
-11.566804571< 0.1%
 
-11.564508651< 0.1%
 
-11.564323571< 0.1%
 
-11.562315921< 0.1%
 
-11.562288981< 0.1%
 
-11.561618981< 0.1%
 
ValueCountFrequency (%) 
-0.998464351< 0.1%
 
-0.9989161< 0.1%
 
-0.999012091< 0.1%
 
-0.999117021< 0.1%
 
-0.99946921< 0.1%
 
-0.999506511< 0.1%
 
-0.999522321< 0.1%
 
-1.000585191< 0.1%
 
-1.00152081< 0.1%
 
-1.001987841< 0.1%
 

wpt_name
Categorical

HIGH CARDINALITY

Distinct count36720
Unique (%)63.8%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
none
 
3492
Shuleni
 
1734
Zahanati
 
814
Msikitini
 
533
Kanisani
 
322
Other values (36715)
50693
ValueCountFrequency (%) 
none34926.1%
 
Shuleni17343.0%
 
Zahanati8141.4%
 
Msikitini5330.9%
 
Kanisani3220.6%
 
Sokoni2560.4%
 
Ofisini2450.4%
 
Shule Ya Msingi1990.3%
 
School1970.3%
 
Bombani1550.3%
 
Shule1520.3%
 
Sekondari1450.3%
 
Madukani1010.2%
 
Hospital860.1%
 
Mkombozi840.1%
 
Mbugani840.1%
 
Kituo Cha Afya800.1%
 
Kisimani780.1%
 
Mkuyuni770.1%
 
Ccm760.1%
 
Ofisi Ya Kijiji760.1%
 
Muungano760.1%
 
Center730.1%
 
Tankini730.1%
 
Bwawani650.1%
 
Other values (36695)4831583.9%
 

Length

Max length30
Median length10
Mean length11.03274988
Min length1

Overview of Unicode Properties

Unique unicode characters75
Unique unicode categories (?)10
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a9629715.2%
 
i510388.0%
 
493767.8%
 
n408786.4%
 
e401126.3%
 
w313014.9%
 
K310864.9%
 
o292614.6%
 
u234333.7%
 
M214753.4%
 
l204003.2%
 
m169302.7%
 
h168552.7%
 
s164182.6%
 
r138492.2%
 
g125752.0%
 
t112981.8%
 
k107431.7%
 
S105191.7%
 
d100891.6%
 
b100611.6%
 
y75621.2%
 
z61621.0%
 
c49400.8%
 
N47100.7%
 
Other values (50)479867.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter48049575.6%
 
Uppercase Letter10288516.2%
 
Space Separator493767.8%
 
Decimal Number16770.3%
 
Other Punctuation7010.1%
 
Dash Punctuation103< 0.1%
 
Open Punctuation37< 0.1%
 
Close Punctuation37< 0.1%
 
Connector Punctuation24< 0.1%
 
Modifier Symbol19< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a9629720.0%
 
i5103810.6%
 
n408788.5%
 
e401128.3%
 
w313016.5%
 
o292616.1%
 
u234334.9%
 
l204004.2%
 
m169303.5%
 
h168553.5%
 
s164183.4%
 
r138492.9%
 
g125752.6%
 
t112982.4%
 
k107432.2%
 
d100892.1%
 
b100612.1%
 
y75621.6%
 
z61621.3%
 
c49401.0%
 
p34700.7%
 
j33380.7%
 
f22550.5%
 
v10300.2%
 
x126< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
K3108630.2%
 
M2147520.9%
 
S1051910.2%
 
N47104.6%
 
A34303.3%
 
B32223.1%
 
C27172.6%
 
P25072.4%
 
L24752.4%
 
J23202.3%
 
Y19481.9%
 
T18881.8%
 
I17441.7%
 
R16171.6%
 
H15731.5%
 
Z14981.5%
 
D14001.4%
 
G12991.3%
 
O12151.2%
 
E11911.2%
 
U9320.9%
 
W8620.8%
 
F8100.8%
 
V3870.4%
 
Q530.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
49376100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'39656.5%
 
.17424.8%
 
/12818.3%
 
&20.3%
 
\10.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-103100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
150730.2%
 
243926.2%
 
31519.0%
 
41197.1%
 
71066.3%
 
5865.1%
 
6794.7%
 
8754.5%
 
9704.2%
 
0452.7%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(2978.4%
 
[821.6%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)2978.4%
 
]821.6%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_24100.0%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`19100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin58338091.8%
 
Common519748.2%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a9629716.5%
 
i510388.7%
 
n408787.0%
 
e401126.9%
 
w313015.4%
 
K310865.3%
 
o292615.0%
 
u234334.0%
 
M214753.7%
 
l204003.5%
 
m169302.9%
 
h168552.9%
 
s164182.8%
 
r138492.4%
 
g125752.2%
 
t112981.9%
 
k107431.8%
 
S105191.8%
 
d100891.7%
 
b100611.7%
 
y75621.3%
 
z61621.1%
 
c49400.8%
 
N47100.8%
 
p34700.6%
 
Other values (27)419187.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
4937695.0%
 
15071.0%
 
24390.8%
 
'3960.8%
 
.1740.3%
 
31510.3%
 
/1280.2%
 
41190.2%
 
71060.2%
 
-1030.2%
 
5860.2%
 
6790.2%
 
8750.1%
 
9700.1%
 
0450.1%
 
(290.1%
 
)290.1%
 
_24< 0.1%
 
`19< 0.1%
 
[8< 0.1%
 
]8< 0.1%
 
&2< 0.1%
 
\1< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII635354100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a9629715.2%
 
i510388.0%
 
493767.8%
 
n408786.4%
 
e401126.3%
 
w313014.9%
 
K310864.9%
 
o292614.6%
 
u234333.7%
 
M214753.4%
 
l204003.2%
 
m169302.7%
 
h168552.7%
 
s164182.6%
 
r138492.2%
 
g125752.0%
 
t112981.8%
 
k107431.7%
 
S105191.7%
 
d100891.6%
 
b100611.6%
 
y75621.2%
 
z61621.0%
 
c49400.8%
 
N47100.7%
 
Other values (50)479867.6%
 

num_private
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count65
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.48906022087934986
Minimum0
Maximum1776
Zeros56831
Zeros (%)98.7%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.42695441
Coefficient of variation (CV)25.40986546
Kurtosis10798.07441
Mean0.4890602209
Median Absolute Deviation (MAD)0
Skewness90.52355548
Sum28164
Variance154.429196
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
05683198.7%
 
6810.1%
 
1730.1%
 
5460.1%
 
8460.1%
 
32400.1%
 
45360.1%
 
15350.1%
 
39300.1%
 
9328< 0.1%
 
327< 0.1%
 
726< 0.1%
 
223< 0.1%
 
6522< 0.1%
 
4721< 0.1%
 
10220< 0.1%
 
420< 0.1%
 
1717< 0.1%
 
8015< 0.1%
 
2014< 0.1%
 
2512< 0.1%
 
1111< 0.1%
 
4110< 0.1%
 
3410< 0.1%
 
168< 0.1%
 
Other values (40)860.1%
 
ValueCountFrequency (%) 
05683198.7%
 
1730.1%
 
223< 0.1%
 
327< 0.1%
 
420< 0.1%
 
5460.1%
 
6810.1%
 
726< 0.1%
 
8460.1%
 
94< 0.1%
 
ValueCountFrequency (%) 
17761< 0.1%
 
14021< 0.1%
 
7551< 0.1%
 
6981< 0.1%
 
6721< 0.1%
 
6681< 0.1%
 
4501< 0.1%
 
3001< 0.1%
 
2801< 0.1%
 
2401< 0.1%
 

basin
Categorical

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
Pangani
8940
Lake Victoria
8535
Rufiji
7976
Internal
7785
Lake Tanganyika
6333
Other values (4)
18019
ValueCountFrequency (%) 
Pangani894015.5%
 
Lake Victoria853514.8%
 
Rufiji797613.9%
 
Internal778513.5%
 
Lake Tanganyika633311.0%
 
Wami / Ruvu598710.4%
 
Lake Nyasa50858.8%
 
Ruvuma / Southern Coast44937.8%
 
Lake Rukwa24544.3%
 

Length

Max length23
Median length10
Mean length10.82260193
Min length6

Overview of Unicode Properties

Unique unicode characters32
Unique unicode categories (?)4
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a10320316.6%
 
i542828.7%
 
n506098.1%
 
478607.7%
 
u358835.8%
 
e346855.6%
 
k311945.0%
 
t253064.1%
 
L224073.6%
 
R209103.4%
 
r208133.3%
 
o175212.8%
 
g152732.5%
 
y114181.8%
 
v104801.7%
 
m104801.7%
 
/104801.7%
 
s95781.5%
 
P89401.4%
 
V85351.4%
 
c85351.4%
 
f79761.3%
 
j79761.3%
 
I77851.2%
 
l77851.2%
 
Other values (7)333385.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter46994475.4%
 
Uppercase Letter9496815.2%
 
Space Separator478607.7%
 
Other Punctuation104801.7%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
L2240723.6%
 
R2091022.0%
 
P89409.4%
 
V85359.0%
 
I77858.2%
 
T63336.7%
 
W59876.3%
 
N50855.4%
 
S44934.7%
 
C44934.7%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a10320322.0%
 
i5428211.6%
 
n5060910.8%
 
u358837.6%
 
e346857.4%
 
k311946.6%
 
t253065.4%
 
r208134.4%
 
o175213.7%
 
g152733.2%
 
y114182.4%
 
v104802.2%
 
m104802.2%
 
s95782.0%
 
c85351.8%
 
f79761.7%
 
j79761.7%
 
l77851.7%
 
h44931.0%
 
w24540.5%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
47860100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/10480100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin56491290.6%
 
Common583409.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a10320318.3%
 
i542829.6%
 
n506099.0%
 
u358836.4%
 
e346856.1%
 
k311945.5%
 
t253064.5%
 
L224074.0%
 
R209103.7%
 
r208133.7%
 
o175213.1%
 
g152732.7%
 
y114182.0%
 
v104801.9%
 
m104801.9%
 
s95781.7%
 
P89401.6%
 
V85351.5%
 
c85351.5%
 
f79761.4%
 
j79761.4%
 
I77851.4%
 
l77851.4%
 
T63331.1%
 
W59871.1%
 
Other values (5)210183.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
4786082.0%
 
/1048018.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII623252100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a10320316.6%
 
i542828.7%
 
n506098.1%
 
478607.7%
 
u358835.8%
 
e346855.6%
 
k311945.0%
 
t253064.1%
 
L224073.6%
 
R209103.4%
 
r208133.3%
 
o175212.8%
 
g152732.5%
 
y114181.8%
 
v104801.7%
 
m104801.7%
 
/104801.7%
 
s95781.5%
 
P89401.4%
 
V85351.4%
 
c85351.4%
 
f79761.3%
 
j79761.3%
 
I77851.2%
 
l77851.2%
 
Other values (7)333385.3%
 

subvillage
Categorical

HIGH CARDINALITY

Distinct count18567
Unique (%)32.5%
Missing371
Missing (%)0.6%
Memory size450.0 KiB
Majengo
 
494
Shuleni
 
492
Madukani
 
435
Kati
 
366
Mtakuja
 
257
Other values (18562)
55173
ValueCountFrequency (%) 
Majengo4940.9%
 
Shuleni4920.9%
 
Madukani4350.8%
 
Kati3660.6%
 
Mtakuja2570.4%
 
Sokoni2280.4%
 
M1870.3%
 
Muungano1700.3%
 
Mbuyuni1640.3%
 
Mlimani1470.3%
 
Songambele1350.2%
 
Msikitini1340.2%
 
Miembeni1340.2%
 
11320.2%
 
Kibaoni1140.2%
 
Kanisani1100.2%
 
I1090.2%
 
Mapinduzi1090.2%
 
Mjimwema1080.2%
 
Mjini1040.2%
 
Mkwajuni1040.2%
 
Mwenge1010.2%
 
Azimio980.2%
 
Mabatini970.2%
 
Bwawani910.2%
 
Other values (18542)5259791.3%
 
(Missing)3710.6%
 

Length

Max length30
Median length7
Mean length7.856029034
Min length1

Overview of Unicode Properties

Unique unicode characters73
Unique unicode categories (?)10
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a6968015.4%
 
i445049.8%
 
n333077.4%
 
u254615.6%
 
e250065.5%
 
o230225.1%
 
M197364.4%
 
g183454.1%
 
l156553.5%
 
m144913.2%
 
K123862.7%
 
115172.5%
 
t114342.5%
 
b113562.5%
 
k107792.4%
 
r98402.2%
 
s95552.1%
 
w94252.1%
 
h91622.0%
 
d79341.8%
 
y67231.5%
 
N57611.3%
 
B48561.1%
 
I43391.0%
 
j42070.9%
 
Other values (48)339327.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter37002081.8%
 
Uppercase Letter6912615.3%
 
Space Separator115172.5%
 
Other Punctuation10780.2%
 
Decimal Number5780.1%
 
Modifier Symbol45< 0.1%
 
Dash Punctuation36< 0.1%
 
Open Punctuation5< 0.1%
 
Close Punctuation5< 0.1%
 
Connector Punctuation3< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M1973628.6%
 
K1238617.9%
 
N57618.3%
 
B48567.0%
 
I43396.3%
 
S39015.6%
 
A29714.3%
 
C24463.5%
 
L24153.5%
 
U16782.4%
 
T11071.6%
 
W10511.5%
 
R9011.3%
 
O8721.3%
 
G8551.2%
 
J7281.1%
 
D6220.9%
 
P4860.7%
 
H4570.7%
 
E3570.5%
 
Z3510.5%
 
V3330.5%
 
Y2760.4%
 
F1740.3%
 
Q670.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a6968018.8%
 
i4450412.0%
 
n333079.0%
 
u254616.9%
 
e250066.8%
 
o230226.2%
 
g183455.0%
 
l156554.2%
 
m144913.9%
 
t114343.1%
 
b113563.1%
 
k107792.9%
 
r98402.7%
 
s95552.6%
 
w94252.5%
 
h91622.5%
 
d79342.1%
 
y67231.8%
 
j42071.1%
 
z36061.0%
 
p27680.7%
 
c15600.4%
 
f10890.3%
 
v10450.3%
 
q62< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
11517100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'94587.7%
 
/1039.6%
 
.282.6%
 
#20.2%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
123741.0%
 
27012.1%
 
3508.7%
 
4447.6%
 
6335.7%
 
8325.5%
 
9325.5%
 
0305.2%
 
5284.8%
 
7223.8%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`45100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(480.0%
 
[120.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)480.0%
 
]120.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-36100.0%
 

Most frequent Connector Punctuation characters

ValueCountFrequency (%) 
_3100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin43914697.1%
 
Common132672.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a6968015.9%
 
i4450410.1%
 
n333077.6%
 
u254615.8%
 
e250065.7%
 
o230225.2%
 
M197364.5%
 
g183454.2%
 
l156553.6%
 
m144913.3%
 
K123862.8%
 
t114342.6%
 
b113562.6%
 
k107792.5%
 
r98402.2%
 
s95552.2%
 
w94252.1%
 
h91622.1%
 
d79341.8%
 
y67231.5%
 
N57611.3%
 
B48561.1%
 
I43391.0%
 
j42071.0%
 
S39010.9%
 
Other values (26)282816.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
1151786.8%
 
'9457.1%
 
12371.8%
 
/1030.8%
 
2700.5%
 
3500.4%
 
`450.3%
 
4440.3%
 
-360.3%
 
6330.2%
 
8320.2%
 
9320.2%
 
0300.2%
 
5280.2%
 
.280.2%
 
7220.2%
 
(4< 0.1%
 
)4< 0.1%
 
_3< 0.1%
 
#2< 0.1%
 
[1< 0.1%
 
]1< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII452413100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a6968015.4%
 
i445049.8%
 
n333077.4%
 
u254615.6%
 
e250065.5%
 
o230225.1%
 
M197364.4%
 
g183454.1%
 
l156553.5%
 
m144913.2%
 
K123862.7%
 
115172.5%
 
t114342.5%
 
b113562.5%
 
k107792.4%
 
r98402.2%
 
s95552.1%
 
w94252.1%
 
h91622.0%
 
d79341.8%
 
y67231.5%
 
N57611.3%
 
B48561.1%
 
I43391.0%
 
j42070.9%
 
Other values (48)339327.5%
 

region
Categorical

Distinct count21
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
Iringa
 
5294
Mbeya
 
4639
Kilimanjaro
 
4379
Morogoro
 
4006
Shinyanga
 
3977
Other values (16)
35293
ValueCountFrequency (%) 
Iringa52949.2%
 
Mbeya46398.1%
 
Kilimanjaro43797.6%
 
Morogoro40067.0%
 
Shinyanga39776.9%
 
Arusha33505.8%
 
Kagera33165.8%
 
Kigoma28164.9%
 
Ruvuma26404.6%
 
Pwani26354.6%
 
Tanga25474.4%
 
Mwanza22954.0%
 
Dodoma22013.8%
 
Singida20933.6%
 
Mara19693.4%
 
Tabora19593.4%
 
Rukwa18083.1%
 
Mtwara17303.0%
 
Manyara15832.7%
 
Lindi15462.7%
 
Dar es Salaam8051.4%
 

Length

Max length13
Median length6
Mean length6.591025908
Min length4

Overview of Unicode Properties

Unique unicode characters32
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a7978921.0%
 
r323978.5%
 
i307588.1%
 
n303268.0%
 
o295807.8%
 
g240496.3%
 
M162224.3%
 
m128413.4%
 
K105112.8%
 
u104382.7%
 
y101992.7%
 
e87602.3%
 
w84682.2%
 
h73271.9%
 
S68751.8%
 
b65981.7%
 
d58401.5%
 
I52941.4%
 
l51841.4%
 
T45061.2%
 
R44481.2%
 
j43791.2%
 
s41551.1%
 
A33500.9%
 
D30060.8%
 
Other values (7)142643.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter31956184.2%
 
Uppercase Letter5839315.4%
 
Space Separator16100.4%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M1622227.8%
 
K1051118.0%
 
S687511.8%
 
I52949.1%
 
T45067.7%
 
R44487.6%
 
A33505.7%
 
D30065.1%
 
P26354.5%
 
L15462.6%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a7978925.0%
 
r3239710.1%
 
i307589.6%
 
n303269.5%
 
o295809.3%
 
g240497.5%
 
m128414.0%
 
u104383.3%
 
y101993.2%
 
e87602.7%
 
w84682.6%
 
h73272.3%
 
b65982.1%
 
d58401.8%
 
l51841.6%
 
j43791.4%
 
s41551.3%
 
v26400.8%
 
z22950.7%
 
k18080.6%
 
t17300.5%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1610100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin37795499.6%
 
Common16100.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a7978921.1%
 
r323978.6%
 
i307588.1%
 
n303268.0%
 
o295807.8%
 
g240496.4%
 
M162224.3%
 
m128413.4%
 
K105112.8%
 
u104382.8%
 
y101992.7%
 
e87602.3%
 
w84682.2%
 
h73271.9%
 
S68751.8%
 
b65981.7%
 
d58401.5%
 
I52941.4%
 
l51841.4%
 
T45061.2%
 
R44481.2%
 
j43791.2%
 
s41551.1%
 
A33500.9%
 
D30060.8%
 
Other values (6)126543.3%
 

Most frequent Common characters

ValueCountFrequency (%) 
1610100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII379564100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a7978921.0%
 
r323978.5%
 
i307588.1%
 
n303268.0%
 
o295807.8%
 
g240496.3%
 
M162224.3%
 
m128413.4%
 
K105112.8%
 
u104382.7%
 
y101992.7%
 
e87602.3%
 
w84682.2%
 
h73271.9%
 
S68751.8%
 
b65981.7%
 
d58401.5%
 
I52941.4%
 
l51841.4%
 
T45061.2%
 
R44481.2%
 
j43791.2%
 
s41551.1%
 
A33500.9%
 
D30060.8%
 
Other values (7)142643.8%
 

region_code
Real number (ℝ≥0)

Distinct count27
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.217614780857122
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size450.0 KiB

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.85525395
Coefficient of variation (CV)1.173328028
Kurtosis9.958197847
Mean15.21761478
Median Absolute Deviation (MAD)6
Skewness3.141767441
Sum876352
Variance318.8100935
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1152979.2%
 
1246398.1%
 
343797.6%
 
540407.0%
 
1739546.9%
 
1833245.8%
 
230245.3%
 
1628164.9%
 
1026404.6%
 
425134.4%
 
1922954.0%
 
122013.8%
 
1320933.6%
 
1419793.4%
 
2019693.4%
 
1518083.1%
 
616092.8%
 
2115832.7%
 
8012382.1%
 
6010251.8%
 
909171.6%
 
78051.4%
 
994230.7%
 
93900.7%
 
243260.6%
 
Other values (2)3010.5%
 
ValueCountFrequency (%) 
122013.8%
 
230245.3%
 
343797.6%
 
425134.4%
 
540407.0%
 
616092.8%
 
78051.4%
 
83000.5%
 
93900.7%
 
1026404.6%
 
ValueCountFrequency (%) 
994230.7%
 
909171.6%
 
8012382.1%
 
6010251.8%
 
401< 0.1%
 
243260.6%
 
2115832.7%
 
2019693.4%
 
1922954.0%
 
1833245.8%
 

district_code
Real number (ℝ≥0)

Distinct count20
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.728311453775092
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.7602542
Coefficient of variation (CV)1.703862347
Kurtosis15.65118912
Mean5.728311454
Median Absolute Deviation (MAD)1
Skewness3.901635064
Sum329882
Variance95.26256205
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11114619.4%
 
21090918.9%
 
3999817.4%
 
4899615.6%
 
543567.6%
 
635866.2%
 
733435.8%
 
810431.8%
 
309951.7%
 
338741.5%
 
537451.3%
 
435050.9%
 
133910.7%
 
232930.5%
 
631950.3%
 
621090.2%
 
60630.1%
 
023< 0.1%
 
8012< 0.1%
 
676< 0.1%
 
ValueCountFrequency (%) 
023< 0.1%
 
11114619.4%
 
21090918.9%
 
3999817.4%
 
4899615.6%
 
543567.6%
 
635866.2%
 
733435.8%
 
810431.8%
 
133910.7%
 
ValueCountFrequency (%) 
8012< 0.1%
 
676< 0.1%
 
631950.3%
 
621090.2%
 
60630.1%
 
537451.3%
 
435050.9%
 
338741.5%
 
309951.7%
 
232930.5%
 

lga
Categorical

HIGH CARDINALITY

Distinct count124
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
Njombe
 
2503
Arusha Rural
 
1252
Moshi Rural
 
1251
Rungwe
 
1106
Kilosa
 
1094
Other values (119)
50382
ValueCountFrequency (%) 
Njombe25034.3%
 
Arusha Rural12522.2%
 
Moshi Rural12512.2%
 
Rungwe11061.9%
 
Kilosa10941.9%
 
Kasulu10471.8%
 
Mbozi10341.8%
 
Meru10091.8%
 
Bagamoyo9971.7%
 
Singida Rural9951.7%
 
Kilombero9591.7%
 
Same8771.5%
 
Kibondo8741.5%
 
Kyela8591.5%
 
Kahama8361.5%
 
Kigoma Rural8241.4%
 
Maswa8091.4%
 
Karagwe7711.3%
 
Mbinga7501.3%
 
Iringa Rural7281.3%
 
Serengeti7161.2%
 
Namtumbo6941.2%
 
Lushoto6941.2%
 
Songea Rural6931.2%
 
Mpanda6791.2%
 
Other values (99)3353758.2%
 

Length

Max length16
Median length6
Mean length7.463568799
Min length3

Overview of Unicode Properties

Unique unicode characters40
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a6716515.6%
 
o300797.0%
 
u280056.5%
 
i269856.3%
 
r258816.0%
 
n225215.2%
 
e220915.1%
 
l192384.5%
 
g180664.2%
 
M156983.7%
 
m156223.6%
 
b156033.6%
 
R122072.8%
 
K116632.7%
 
112352.6%
 
w98202.3%
 
s97472.3%
 
h84642.0%
 
d74051.7%
 
S62611.5%
 
N57601.3%
 
t52081.2%
 
y47631.1%
 
B38340.9%
 
k37210.9%
 
Other values (15)227705.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter34975481.4%
 
Uppercase Letter6882316.0%
 
Space Separator112352.6%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M1569822.8%
 
R1220717.7%
 
K1166316.9%
 
S62619.1%
 
N57608.4%
 
B38345.6%
 
U34105.0%
 
I24803.6%
 
L21313.1%
 
T13672.0%
 
A13151.9%
 
H11531.7%
 
C8811.3%
 
D3580.5%
 
P3050.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a6716519.2%
 
o300798.6%
 
u280058.0%
 
i269857.7%
 
r258817.4%
 
n225216.4%
 
e220916.3%
 
l192385.5%
 
g180665.2%
 
m156224.5%
 
b156034.5%
 
w98202.8%
 
s97472.8%
 
h84642.4%
 
d74052.1%
 
t52081.5%
 
y47631.4%
 
k37211.1%
 
j34961.0%
 
z19430.6%
 
p18540.5%
 
f11060.3%
 
v6710.2%
 
c3000.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
11235100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin41857797.4%
 
Common112352.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a6716516.0%
 
o300797.2%
 
u280056.7%
 
i269856.4%
 
r258816.2%
 
n225215.4%
 
e220915.3%
 
l192384.6%
 
g180664.3%
 
M156983.8%
 
m156223.7%
 
b156033.7%
 
R122072.9%
 
K116632.8%
 
w98202.3%
 
s97472.3%
 
h84642.0%
 
d74051.8%
 
S62611.5%
 
N57601.4%
 
t52081.2%
 
y47631.1%
 
B38340.9%
 
k37210.9%
 
j34960.8%
 
Other values (14)192744.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
11235100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII429812100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a6716515.6%
 
o300797.0%
 
u280056.5%
 
i269856.3%
 
r258816.0%
 
n225215.2%
 
e220915.1%
 
l192384.5%
 
g180664.2%
 
M156983.7%
 
m156223.6%
 
b156033.6%
 
R122072.8%
 
K116632.7%
 
112352.6%
 
w98202.3%
 
s97472.3%
 
h84642.0%
 
d74051.7%
 
S62611.5%
 
N57601.3%
 
t52081.2%
 
y47631.1%
 
B38340.9%
 
k37210.9%
 
Other values (15)227705.3%
 

ward
Categorical

HIGH CARDINALITY

Distinct count2033
Unique (%)3.5%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
Igosi
 
307
Imalinyi
 
252
Siha Kati
 
232
Mdandu
 
231
Nduruma
 
217
Other values (2028)
56349
ValueCountFrequency (%) 
Igosi3070.5%
 
Imalinyi2520.4%
 
Siha Kati2320.4%
 
Mdandu2310.4%
 
Nduruma2170.4%
 
Kitunda2030.4%
 
Mishamo2030.4%
 
Msindo2010.3%
 
Chalinze1960.3%
 
Maji ya Chai1900.3%
 
Usuka1870.3%
 
Ngarenanyuki1720.3%
 
Chanika1710.3%
 
Vikindu1620.3%
 
Mtwango1530.3%
 
Matola1450.3%
 
Zinga/Ikerege1410.2%
 
Wanging'ombe1390.2%
 
Maramba1390.2%
 
Itete1370.2%
 
Magomeni1350.2%
 
Kikatiti1340.2%
 
Ifakara1340.2%
 
Olkokola1330.2%
 
Maposeni1300.2%
 
Other values (2008)5314492.3%
 

Length

Max length23
Median length7
Mean length7.500312565
Min length3

Overview of Unicode Properties

Unique unicode characters54
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a6668915.4%
 
i391179.1%
 
n287956.7%
 
u261306.0%
 
o253965.9%
 
e232075.4%
 
g205594.8%
 
M186034.3%
 
m156163.6%
 
l144853.4%
 
r128563.0%
 
b125632.9%
 
s111262.6%
 
K108472.5%
 
h105582.4%
 
k103162.4%
 
t91982.1%
 
d87512.0%
 
w86242.0%
 
y69621.6%
 
I60301.4%
 
N55501.3%
 
54081.3%
 
z35480.8%
 
S31570.7%
 
Other values (29)278376.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter36271084.0%
 
Uppercase Letter6271114.5%
 
Space Separator54081.3%
 
Other Punctuation10760.2%
 
Dash Punctuation23< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M1860329.7%
 
K1084717.3%
 
I60309.6%
 
N55508.9%
 
S31575.0%
 
L30594.9%
 
U29134.6%
 
B29024.6%
 
C19963.2%
 
R16922.7%
 
T7761.2%
 
D7171.1%
 
O6611.1%
 
V6341.0%
 
P5770.9%
 
H5510.9%
 
W3870.6%
 
G3520.6%
 
Z3040.5%
 
E2890.5%
 
A2600.4%
 
J1870.3%
 
Y1490.2%
 
Q760.1%
 
F420.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a6668918.4%
 
i3911710.8%
 
n287957.9%
 
u261307.2%
 
o253967.0%
 
e232076.4%
 
g205595.7%
 
m156164.3%
 
l144854.0%
 
r128563.5%
 
b125633.5%
 
s111263.1%
 
h105582.9%
 
k103162.8%
 
t91982.5%
 
d87512.4%
 
w86242.4%
 
y69621.9%
 
z35481.0%
 
p28100.8%
 
j24370.7%
 
c13640.4%
 
f8100.2%
 
v7770.2%
 
q16< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
5408100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'92686.1%
 
/15013.9%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-23100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin42542198.5%
 
Common65071.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a6668915.7%
 
i391179.2%
 
n287956.8%
 
u261306.1%
 
o253966.0%
 
e232075.5%
 
g205594.8%
 
M186034.4%
 
m156163.7%
 
l144853.4%
 
r128563.0%
 
b125633.0%
 
s111262.6%
 
K108472.5%
 
h105582.5%
 
k103162.4%
 
t91982.2%
 
d87512.1%
 
w86242.0%
 
y69621.6%
 
I60301.4%
 
N55501.3%
 
z35480.8%
 
S31570.7%
 
L30590.7%
 
Other values (25)236795.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
540883.1%
 
'92614.2%
 
/1502.3%
 
-230.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII431928100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a6668915.4%
 
i391179.1%
 
n287956.7%
 
u261306.0%
 
o253965.9%
 
e232075.4%
 
g205594.8%
 
M186034.3%
 
m156163.6%
 
l144853.4%
 
r128563.0%
 
b125632.9%
 
s111262.6%
 
K108472.5%
 
h105582.4%
 
k103162.4%
 
t91982.1%
 
d87512.0%
 
w86242.0%
 
y69621.6%
 
I60301.4%
 
N55501.3%
 
54081.3%
 
z35480.8%
 
S31570.7%
 
Other values (29)278376.4%
 

population
Real number (ℝ≥0)

ZEROS

Distinct count1049
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean185.57083072862403
Minimum0
Maximum30500
Zeros19569
Zeros (%)34.0%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median35
Q3230
95-th percentile700
Maximum30500
Range30500
Interquartile range (IQR)230

Descriptive statistics

Standard deviation477.7442395
Coefficient of variation (CV)2.574457622
Kurtosis392.9484634
Mean185.5708307
Median Absolute Deviation (MAD)35
Skewness12.51841497
Sum10686653
Variance228239.5584
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01956934.0%
 
1702512.2%
 
20019403.4%
 
15018923.3%
 
25016812.9%
 
30014762.6%
 
10011462.0%
 
5011392.0%
 
50010091.8%
 
3509861.7%
 
1209161.6%
 
4007751.3%
 
607061.2%
 
306261.1%
 
405521.0%
 
805330.9%
 
4504990.9%
 
204620.8%
 
6004380.8%
 
2303880.7%
 
752890.5%
 
10002780.5%
 
8002690.5%
 
902650.5%
 
1302640.5%
 
Other values (1024)1246521.6%
 
ValueCountFrequency (%) 
01956934.0%
 
1702512.2%
 
24< 0.1%
 
34< 0.1%
 
413< 0.1%
 
5440.1%
 
619< 0.1%
 
73< 0.1%
 
823< 0.1%
 
911< 0.1%
 
ValueCountFrequency (%) 
305001< 0.1%
 
153001< 0.1%
 
114631< 0.1%
 
100003< 0.1%
 
98651< 0.1%
 
95001< 0.1%
 
90003< 0.1%
 
88481< 0.1%
 
86001< 0.1%
 
85001< 0.1%
 

public_meeting
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing2976
Missing (%)5.2%
Memory size450.0 KiB
True
49737
False
 
4875
(Missing)
 
2976
ValueCountFrequency (%) 
True4973786.4%
 
False48758.5%
 
(Missing)29765.2%
 

recorded_by
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
GeoData Consultants Ltd
57588
ValueCountFrequency (%) 
GeoData Consultants Ltd57588100.0%
 

Length

Max length23
Median length23
Mean length23
Min length23

Overview of Unicode Properties

Unique unicode characters14
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
t23035217.4%
 
a17276413.0%
 
o1151768.7%
 
1151768.7%
 
n1151768.7%
 
s1151768.7%
 
G575884.3%
 
e575884.3%
 
D575884.3%
 
C575884.3%
 
u575884.3%
 
l575884.3%
 
L575884.3%
 
d575884.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter97899673.9%
 
Uppercase Letter23035217.4%
 
Space Separator1151768.7%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
G5758825.0%
 
D5758825.0%
 
C5758825.0%
 
L5758825.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
t23035223.5%
 
a17276417.6%
 
o11517611.8%
 
n11517611.8%
 
s11517611.8%
 
e575885.9%
 
u575885.9%
 
l575885.9%
 
d575885.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
115176100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin120934891.3%
 
Common1151768.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
t23035219.0%
 
a17276414.3%
 
o1151769.5%
 
n1151769.5%
 
s1151769.5%
 
G575884.8%
 
e575884.8%
 
D575884.8%
 
C575884.8%
 
u575884.8%
 
l575884.8%
 
L575884.8%
 
d575884.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
115176100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1324524100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
t23035217.4%
 
a17276413.0%
 
o1151768.7%
 
1151768.7%
 
n1151768.7%
 
s1151768.7%
 
G575884.3%
 
e575884.3%
 
D575884.3%
 
C575884.3%
 
u575884.3%
 
l575884.3%
 
L575884.3%
 
d575884.3%
 

scheme_management
Categorical

MISSING

Distinct count12
Unique (%)< 0.1%
Missing3750
Missing (%)6.5%
Memory size450.0 KiB
VWC
36143
WUG
 
4249
Water authority
 
3151
WUA
 
2882
Water Board
 
2747
Other values (7)
 
4666
ValueCountFrequency (%) 
VWC3614362.8%
 
WUG42497.4%
 
Water authority31515.5%
 
WUA28825.0%
 
Water Board27474.8%
 
Parastatal16072.8%
 
Private operator10631.8%
 
Company10611.8%
 
Other7651.3%
 
SWC970.2%
 
Trust720.1%
 
None1< 0.1%
 
(Missing)37506.5%
 

Length

Max length16
Median length3
Mean length4.576283253
Min length3

Overview of Unicode Properties

Unique unicode characters29
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
W4926918.7%
 
C3730114.2%
 
V3614313.7%
 
a251619.5%
 
t183777.0%
 
r174296.6%
 
o90863.4%
 
e87903.3%
 
n85623.2%
 
U71312.7%
 
69612.6%
 
G42491.6%
 
i42141.6%
 
y42121.6%
 
h39161.5%
 
u32231.2%
 
A28821.1%
 
B27471.0%
 
d27471.0%
 
P26701.0%
 
p21240.8%
 
s16790.6%
 
l16070.6%
 
v10630.4%
 
m10610.4%
 
Other values (4)9350.4%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter14332754.4%
 
Lowercase Letter11325143.0%
 
Space Separator69612.6%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
W4926934.4%
 
C3730126.0%
 
V3614325.2%
 
U71315.0%
 
G42493.0%
 
A28822.0%
 
B27471.9%
 
P26701.9%
 
O7650.5%
 
S970.1%
 
T720.1%
 
N1< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a2516122.2%
 
t1837716.2%
 
r1742915.4%
 
o90868.0%
 
e87907.8%
 
n85627.6%
 
i42143.7%
 
y42123.7%
 
h39163.5%
 
u32232.8%
 
d27472.4%
 
p21241.9%
 
s16791.5%
 
l16071.4%
 
v10630.9%
 
m10610.9%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
6961100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin25657897.4%
 
Common69612.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
W4926919.2%
 
C3730114.5%
 
V3614314.1%
 
a251619.8%
 
t183777.2%
 
r174296.8%
 
o90863.5%
 
e87903.4%
 
n85623.3%
 
U71312.8%
 
G42491.7%
 
i42141.6%
 
y42121.6%
 
h39161.5%
 
u32231.3%
 
A28821.1%
 
B27471.1%
 
d27471.1%
 
P26701.0%
 
p21240.8%
 
s16790.7%
 
l16070.6%
 
v10630.4%
 
m10610.4%
 
O7650.3%
 
Other values (3)1700.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
6961100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII263539100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
W4926918.7%
 
C3730114.2%
 
V3614313.7%
 
a251619.5%
 
t183777.0%
 
r174296.6%
 
o90863.4%
 
e87903.3%
 
n85623.2%
 
U71312.7%
 
69612.6%
 
G42491.6%
 
i42141.6%
 
y42121.6%
 
h39161.5%
 
u32231.2%
 
A28821.1%
 
B27471.0%
 
d27471.0%
 
P26701.0%
 
p21240.8%
 
s16790.6%
 
l16070.6%
 
v10630.4%
 
m10610.4%
 
Other values (4)9350.4%
 

scheme_name
Categorical

HIGH CARDINALITY
MISSING

Distinct count2658
Unique (%)8.6%
Missing26692
Missing (%)46.3%
Memory size450.0 KiB
K
 
682
None
 
644
Borehole
 
418
Chalinze wate
 
405
M
 
400
Other values (2653)
28347
ValueCountFrequency (%) 
K6821.2%
 
None6441.1%
 
Borehole4180.7%
 
Chalinze wate4050.7%
 
M4000.7%
 
DANIDA3790.7%
 
Government3200.6%
 
Ngana water supplied scheme2700.5%
 
wanging'ombe water supply s2610.5%
 
wanging'ombe supply scheme2340.4%
 
I2290.4%
 
Bagamoyo wate2290.4%
 
Uroki-Bomang'ombe water sup2090.4%
 
N2040.4%
 
Kirua kahe gravity water supply trust1930.3%
 
Machumba estate pipe line1850.3%
 
Makwale water supplied sche1660.3%
 
Kijiji1610.3%
 
S1540.3%
 
mtwango water supply scheme1520.3%
 
Handeni Trunk Main(H1520.3%
 
Losaa-Kia water supply1520.3%
 
Mkongoro Two1470.3%
 
Roman1390.2%
 
Mkongoro One1280.2%
 
Other values (2633)2428342.2%
 
(Missing)2669246.3%
 

Length

Max length46
Median length3
Mean length9.090383413
Min length1

Overview of Unicode Properties

Unique unicode characters66
Unique unicode categories (?)9
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a7478114.3%
 
n7107613.6%
 
411737.9%
 
e348496.7%
 
i263795.0%
 
p223914.3%
 
r216294.1%
 
t191133.7%
 
u182623.5%
 
o171113.3%
 
l170143.3%
 
s164013.1%
 
w163183.1%
 
m140082.7%
 
y120232.3%
 
g112702.2%
 
M93111.8%
 
h78821.5%
 
K55281.1%
 
d55271.1%
 
k53101.0%
 
b51061.0%
 
c49781.0%
 
N43180.8%
 
S37370.7%
 
Other values (41)380027.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter43038682.2%
 
Uppercase Letter496589.5%
 
Space Separator411737.9%
 
Other Punctuation13010.2%
 
Dash Punctuation5540.1%
 
Open Punctuation191< 0.1%
 
Decimal Number133< 0.1%
 
Modifier Symbol70< 0.1%
 
Close Punctuation31< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M931118.8%
 
K552811.1%
 
N43188.7%
 
S37377.5%
 
A27295.5%
 
I26885.4%
 
W25015.0%
 
B22594.5%
 
L21064.2%
 
U17903.6%
 
D15763.2%
 
T15433.1%
 
C15263.1%
 
R14072.8%
 
E13362.7%
 
P10472.1%
 
H10232.1%
 
O9551.9%
 
G8991.8%
 
J3850.8%
 
V3690.7%
 
Y2680.5%
 
F2240.5%
 
Z910.2%
 
Q420.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a7478117.4%
 
n7107616.5%
 
e348498.1%
 
i263796.1%
 
p223915.2%
 
r216295.0%
 
t191134.4%
 
u182624.2%
 
o171114.0%
 
l170144.0%
 
s164013.8%
 
w163183.8%
 
m140083.3%
 
y120232.8%
 
g112702.6%
 
h78821.8%
 
d55271.3%
 
k53101.2%
 
b51061.2%
 
c49781.2%
 
v32550.8%
 
j30620.7%
 
z16460.4%
 
f9550.2%
 
q36< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
41173100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
'92270.9%
 
/37028.4%
 
&80.6%
 
:10.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-554100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
26145.9%
 
35541.4%
 
775.3%
 
543.0%
 
032.3%
 
632.3%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(191100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)31100.0%
 

Most frequent Modifier Symbol characters

ValueCountFrequency (%) 
`70100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin48004491.7%
 
Common434538.3%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a7478115.6%
 
n7107614.8%
 
e348497.3%
 
i263795.5%
 
p223914.7%
 
r216294.5%
 
t191134.0%
 
u182623.8%
 
o171113.6%
 
l170143.5%
 
s164013.4%
 
w163183.4%
 
m140082.9%
 
y120232.5%
 
g112702.3%
 
M93111.9%
 
h78821.6%
 
K55281.2%
 
d55271.2%
 
k53101.1%
 
b51061.1%
 
c49781.0%
 
N43180.9%
 
S37370.8%
 
v32550.7%
 
Other values (26)324676.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
4117394.8%
 
'9222.1%
 
-5541.3%
 
/3700.9%
 
(1910.4%
 
`700.2%
 
2610.1%
 
3550.1%
 
)310.1%
 
&8< 0.1%
 
77< 0.1%
 
54< 0.1%
 
03< 0.1%
 
63< 0.1%
 
:1< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII523497100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a7478114.3%
 
n7107613.6%
 
411737.9%
 
e348496.7%
 
i263795.0%
 
p223914.3%
 
r216294.1%
 
t191133.7%
 
u182623.5%
 
o171113.3%
 
l170143.3%
 
s164013.1%
 
w163183.1%
 
m140082.7%
 
y120232.3%
 
g112702.2%
 
M93111.8%
 
h78821.5%
 
K55281.1%
 
d55271.1%
 
k53101.0%
 
b51061.0%
 
c49781.0%
 
N43180.8%
 
S37370.7%
 
Other values (41)380027.3%
 

permit
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing3056
Missing (%)5.3%
Memory size450.0 KiB
True
38100
False
16432
(Missing)
 
3056
ValueCountFrequency (%) 
True3810066.2%
 
False1643228.5%
 
(Missing)30565.3%
 

construction_year
Real number (ℝ≥0)

ZEROS

Distinct count55
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1341.577359866639
Minimum0
Maximum2013
Zeros18897
Zeros (%)32.8%
Memory size450.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1988
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation937.6413676
Coefficient of variation (CV)0.6989096534
Kurtosis-1.464166924
Mean1341.57736
Median Absolute Deviation (MAD)20
Skewness-0.7316760076
Sum77258757
Variance879171.3343
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01889732.8%
 
201026454.6%
 
200826134.5%
 
200925334.4%
 
200020913.6%
 
200715872.8%
 
200614712.6%
 
200312862.2%
 
201112562.2%
 
200411232.0%
 
201210841.9%
 
200210751.9%
 
197810371.8%
 
199510141.8%
 
200510111.8%
 
19999791.7%
 
19989661.7%
 
19909541.7%
 
19859451.6%
 
19808111.4%
 
19968111.4%
 
19847791.4%
 
19827441.3%
 
19947381.3%
 
19727081.2%
 
Other values (30)843014.6%
 
ValueCountFrequency (%) 
01889732.8%
 
19601020.2%
 
196121< 0.1%
 
1962300.1%
 
1963850.1%
 
1964400.1%
 
196519< 0.1%
 
196617< 0.1%
 
1967880.2%
 
1968770.1%
 
ValueCountFrequency (%) 
20131760.3%
 
201210841.9%
 
201112562.2%
 
201026454.6%
 
200925334.4%
 
200826134.5%
 
200715872.8%
 
200614712.6%
 
200510111.8%
 
200411232.0%
 

extraction_type
Categorical

HIGH CORRELATION

Distinct count18
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
gravity
26696
nira/tanira
7361
other
6160
submersible
 
4688
swn 80
 
3448
Other values (13)
9235
ValueCountFrequency (%) 
gravity2669646.4%
 
nira/tanira736112.8%
 
other616010.7%
 
submersible46888.1%
 
swn 8034486.0%
 
mono28174.9%
 
india mark ii22844.0%
 
afridev16592.9%
 
ksb13582.4%
 
other - rope pump4510.8%
 
other - swn 812290.4%
 
windmill1170.2%
 
india mark iii910.2%
 
cemo900.2%
 
other - play pump850.1%
 
climax320.1%
 
walimi20< 0.1%
 
other - mkulima/shinyanga2< 0.1%
 

Length

Max length25
Median length7
Mean length7.689032437
Min length3

Overview of Unicode Properties

Unique unicode characters29
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
i5766613.0%
 
r5751813.0%
 
a5533112.5%
 
t409849.3%
 
v283556.4%
 
y267836.0%
 
g266986.0%
 
n237125.4%
 
e185034.2%
 
s144133.3%
 
o131023.0%
 
b107342.4%
 
m106792.4%
 
104972.4%
 
/73631.7%
 
h69291.6%
 
u52261.2%
 
l50611.1%
 
d41510.9%
 
w38140.9%
 
k37350.8%
 
836770.8%
 
034480.8%
 
f16590.4%
 
p16080.4%
 
Other values (4)11500.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter41681594.1%
 
Space Separator104972.4%
 
Other Punctuation73631.7%
 
Decimal Number73541.7%
 
Dash Punctuation7670.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
i5766613.8%
 
r5751813.8%
 
a5533113.3%
 
t409849.8%
 
v283556.8%
 
y267836.4%
 
g266986.4%
 
n237125.7%
 
e185034.4%
 
s144133.5%
 
o131023.1%
 
b107342.6%
 
m106792.6%
 
h69291.7%
 
u52261.3%
 
l50611.2%
 
d41511.0%
 
w38140.9%
 
k37350.9%
 
f16590.4%
 
p16080.4%
 
c122< 0.1%
 
x32< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
10497100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
8367750.0%
 
0344846.9%
 
12293.1%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/7363100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-767100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin41681594.1%
 
Common259815.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
i5766613.8%
 
r5751813.8%
 
a5533113.3%
 
t409849.8%
 
v283556.8%
 
y267836.4%
 
g266986.4%
 
n237125.7%
 
e185034.4%
 
s144133.5%
 
o131023.1%
 
b107342.6%
 
m106792.6%
 
h69291.7%
 
u52261.3%
 
l50611.2%
 
d41511.0%
 
w38140.9%
 
k37350.9%
 
f16590.4%
 
p16080.4%
 
c122< 0.1%
 
x32< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
1049740.4%
 
/736328.3%
 
8367714.2%
 
0344813.3%
 
-7673.0%
 
12290.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII442796100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
i5766613.0%
 
r5751813.0%
 
a5533112.5%
 
t409849.3%
 
v283556.4%
 
y267836.0%
 
g266986.0%
 
n237125.4%
 
e185034.2%
 
s144133.3%
 
o131023.0%
 
b107342.4%
 
m106792.4%
 
104972.4%
 
/73631.7%
 
h69291.6%
 
u52261.2%
 
l50611.1%
 
d41510.9%
 
w38140.9%
 
k37350.8%
 
836770.8%
 
034480.8%
 
f16590.4%
 
p16080.4%
 
Other values (4)11500.3%
 

extraction_type_group
Categorical

HIGH CORRELATION

Distinct count13
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
gravity
26696
nira/tanira
7361
other
6160
submersible
6046
swn 80
 
3448
Other values (8)
7877
ValueCountFrequency (%) 
gravity2669646.4%
 
nira/tanira736112.8%
 
other616010.7%
 
submersible604610.5%
 
swn 8034486.0%
 
mono28174.9%
 
india mark ii22844.0%
 
afridev16592.9%
 
rope pump4510.8%
 
other handpump3360.6%
 
other motorpump1220.2%
 
wind-powered1170.2%
 
india mark iii910.2%
 

Length

Max length15
Median length7
Mean length7.843318052
Min length4

Overview of Unicode Properties

Unique unicode characters26
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
i5883113.0%
 
r5880613.0%
 
a5552412.3%
 
t407979.0%
 
v283556.3%
 
g266965.9%
 
y266965.9%
 
n238155.3%
 
e210544.7%
 
s155403.4%
 
o130642.9%
 
m122692.7%
 
b120922.7%
 
91072.0%
 
/73611.6%
 
u69551.5%
 
h69541.5%
 
l60461.3%
 
d46041.0%
 
w36820.8%
 
834480.8%
 
034480.8%
 
p23860.5%
 
k23750.5%
 
f16590.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter42820094.8%
 
Space Separator91072.0%
 
Other Punctuation73611.6%
 
Decimal Number68961.5%
 
Dash Punctuation117< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
i5883113.7%
 
r5880613.7%
 
a5552413.0%
 
t407979.5%
 
v283556.6%
 
g266966.2%
 
y266966.2%
 
n238155.6%
 
e210544.9%
 
s155403.6%
 
o130643.1%
 
m122692.9%
 
b120922.8%
 
u69551.6%
 
h69541.6%
 
l60461.4%
 
d46041.1%
 
w36820.9%
 
p23860.6%
 
k23750.6%
 
f16590.4%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
9107100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
8344850.0%
 
0344850.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/7361100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-117100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin42820094.8%
 
Common234815.2%
 

Most frequent Latin characters

ValueCountFrequency (%) 
i5883113.7%
 
r5880613.7%
 
a5552413.0%
 
t407979.5%
 
v283556.6%
 
g266966.2%
 
y266966.2%
 
n238155.6%
 
e210544.9%
 
s155403.6%
 
o130643.1%
 
m122692.9%
 
b120922.8%
 
u69551.6%
 
h69541.6%
 
l60461.4%
 
d46041.1%
 
w36820.9%
 
p23860.6%
 
k23750.6%
 
f16590.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
910738.8%
 
/736131.3%
 
8344814.7%
 
0344814.7%
 
-1170.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII451681100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
i5883113.0%
 
r5880613.0%
 
a5552412.3%
 
t407979.0%
 
v283556.3%
 
g266965.9%
 
y266965.9%
 
n238155.3%
 
e210544.7%
 
s155403.4%
 
o130642.9%
 
m122692.7%
 
b120922.7%
 
91072.0%
 
/73611.6%
 
u69551.5%
 
h69541.5%
 
l60461.3%
 
d46041.0%
 
w36820.8%
 
834480.8%
 
034480.8%
 
p23860.5%
 
k23750.5%
 
f16590.4%
 

extraction_type_class
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
gravity
26696
handpump
15179
other
6160
submersible
6046
motorpump
 
2939
Other values (2)
 
568
ValueCountFrequency (%) 
gravity2669646.4%
 
handpump1517926.4%
 
other616010.7%
 
submersible604610.5%
 
motorpump29395.1%
 
rope pump4510.8%
 
wind-powered1170.2%
 

Length

Max length12
Median length7
Mean length7.597485587
Min length5

Overview of Unicode Properties

Unique unicode characters21
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
r424099.7%
 
a418759.6%
 
p377068.6%
 
t357958.2%
 
i328597.5%
 
m275546.3%
 
g266966.1%
 
v266966.1%
 
y266966.1%
 
u246155.6%
 
h213394.9%
 
e189374.3%
 
d154133.5%
 
n152963.5%
 
o126062.9%
 
s120922.8%
 
b120922.8%
 
l60461.4%
 
4510.1%
 
w2340.1%
 
-117< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter43695699.9%
 
Space Separator4510.1%
 
Dash Punctuation117< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
r424099.7%
 
a418759.6%
 
p377068.6%
 
t357958.2%
 
i328597.5%
 
m275546.3%
 
g266966.1%
 
v266966.1%
 
y266966.1%
 
u246155.6%
 
h213394.9%
 
e189374.3%
 
d154133.5%
 
n152963.5%
 
o126062.9%
 
s120922.8%
 
b120922.8%
 
l60461.4%
 
w2340.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-117100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
451100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin43695699.9%
 
Common5680.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
r424099.7%
 
a418759.6%
 
p377068.6%
 
t357958.2%
 
i328597.5%
 
m275546.3%
 
g266966.1%
 
v266966.1%
 
y266966.1%
 
u246155.6%
 
h213394.9%
 
e189374.3%
 
d154133.5%
 
n152963.5%
 
o126062.9%
 
s120922.8%
 
b120922.8%
 
l60461.4%
 
w2340.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
45179.4%
 
-11720.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII437524100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
r424099.7%
 
a418759.6%
 
p377068.6%
 
t357958.2%
 
i328597.5%
 
m275546.3%
 
g266966.1%
 
v266966.1%
 
y266966.1%
 
u246155.6%
 
h213394.9%
 
e189374.3%
 
d154133.5%
 
n152963.5%
 
o126062.9%
 
s120922.8%
 
b120922.8%
 
l60461.4%
 
4510.1%
 
w2340.1%
 
-117< 0.1%
 

management
Categorical

HIGH CORRELATION

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
vwc
39746
wug
 
5556
water board
 
2932
wua
 
2533
private operator
 
1970
Other values (7)
 
4851
ValueCountFrequency (%) 
vwc3974669.0%
 
wug55569.6%
 
water board29325.1%
 
wua25334.4%
 
private operator19703.4%
 
parastatal16962.9%
 
water authority9021.6%
 
other8401.5%
 
company6851.2%
 
unknown5511.0%
 
other - school990.2%
 
trust780.1%
 

Length

Max length16
Median length3
Mean length4.382770716
Min length3

Overview of Unicode Properties

Unique unicode characters23
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
w5222020.7%
 
v4171616.5%
 
c4053016.1%
 
a216108.6%
 
r162916.5%
 
t140655.6%
 
o101474.0%
 
u96203.8%
 
e87133.5%
 
p63212.5%
 
60022.4%
 
g55562.2%
 
b29321.2%
 
d29321.2%
 
i28721.1%
 
n23380.9%
 
h19400.8%
 
s18730.7%
 
l17950.7%
 
y15870.6%
 
m6850.3%
 
k5510.2%
 
-99< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter24629497.6%
 
Space Separator60022.4%
 
Dash Punctuation99< 0.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
w5222021.2%
 
v4171616.9%
 
c4053016.5%
 
a216108.8%
 
r162916.6%
 
t140655.7%
 
o101474.1%
 
u96203.9%
 
e87133.5%
 
p63212.6%
 
g55562.3%
 
b29321.2%
 
d29321.2%
 
i28721.2%
 
n23380.9%
 
h19400.8%
 
s18730.8%
 
l17950.7%
 
y15870.6%
 
m6850.3%
 
k5510.2%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
6002100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-99100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin24629497.6%
 
Common61012.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
w5222021.2%
 
v4171616.9%
 
c4053016.5%
 
a216108.8%
 
r162916.6%
 
t140655.7%
 
o101474.1%
 
u96203.9%
 
e87133.5%
 
p63212.6%
 
g55562.3%
 
b29321.2%
 
d29321.2%
 
i28721.2%
 
n23380.9%
 
h19400.8%
 
s18730.8%
 
l17950.7%
 
y15870.6%
 
m6850.3%
 
k5510.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
600298.4%
 
-991.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII252395100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
w5222020.7%
 
v4171616.5%
 
c4053016.1%
 
a216108.6%
 
r162916.5%
 
t140655.6%
 
o101474.0%
 
u96203.8%
 
e87133.5%
 
p63212.5%
 
60022.4%
 
g55562.2%
 
b29321.2%
 
d29321.2%
 
i28721.1%
 
n23380.9%
 
h19400.8%
 
s18730.7%
 
l17950.7%
 
y15870.6%
 
m6850.3%
 
k5510.2%
 
-99< 0.1%
 

management_group
Categorical

HIGH CORRELATION

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
user-group
50767
commercial
 
3635
parastatal
 
1696
other
 
939
unknown
 
551
ValueCountFrequency (%) 
user-group5076788.2%
 
commercial36356.3%
 
parastatal16962.9%
 
other9391.6%
 
unknown5511.0%
 

Length

Max length10
Median length10
Mean length9.889768702
Min length5

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
r10780418.9%
 
u10208517.9%
 
o558929.8%
 
e553419.7%
 
s524639.2%
 
p524639.2%
 
-507678.9%
 
g507678.9%
 
a104191.8%
 
c72701.3%
 
m72701.3%
 
l53310.9%
 
t43310.8%
 
i36350.6%
 
n16530.3%
 
h9390.2%
 
k5510.1%
 
w5510.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter51876591.1%
 
Dash Punctuation507678.9%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
r10780420.8%
 
u10208519.7%
 
o5589210.8%
 
e5534110.7%
 
s5246310.1%
 
p5246310.1%
 
g507679.8%
 
a104192.0%
 
c72701.4%
 
m72701.4%
 
l53311.0%
 
t43310.8%
 
i36350.7%
 
n16530.3%
 
h9390.2%
 
k5510.1%
 
w5510.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-50767100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin51876591.1%
 
Common507678.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
r10780420.8%
 
u10208519.7%
 
o5589210.8%
 
e5534110.7%
 
s5246310.1%
 
p5246310.1%
 
g507679.8%
 
a104192.0%
 
c72701.4%
 
m72701.4%
 
l53311.0%
 
t43310.8%
 
i36350.7%
 
n16530.3%
 
h9390.2%
 
k5510.1%
 
w5510.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
-50767100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII569532100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
r10780418.9%
 
u10208517.9%
 
o558929.8%
 
e553419.7%
 
s524639.2%
 
p524639.2%
 
-507678.9%
 
g507678.9%
 
a104191.8%
 
c72701.3%
 
m72701.3%
 
l53310.9%
 
t43310.8%
 
i36350.6%
 
n16530.3%
 
h9390.2%
 
k5510.1%
 
w5510.1%
 

payment
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
never pay
24380
pay per bucket
8953
pay monthly
8229
unknown
7654
pay when scheme fails
 
3843
Other values (2)
 
4529
ValueCountFrequency (%) 
never pay2438042.3%
 
pay per bucket895315.5%
 
pay monthly822914.3%
 
unknown765413.3%
 
pay when scheme fails38436.7%
 
pay annually36266.3%
 
other9031.6%
 

Length

Max length21
Median length9
Mean length10.72426547
Min length5

Overview of Unicode Properties

Unique unicode characters21
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e7909812.8%
 
n6666610.8%
 
6567010.6%
 
y608869.9%
 
a601269.7%
 
p579849.4%
 
r342365.5%
 
v243803.9%
 
u202333.3%
 
l193243.1%
 
t180852.9%
 
h168182.7%
 
o167862.7%
 
k166072.7%
 
c127962.1%
 
m120722.0%
 
w114971.9%
 
b89531.4%
 
s76861.2%
 
f38430.6%
 
i38430.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter55191989.4%
 
Space Separator6567010.6%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e7909814.3%
 
n6666612.1%
 
y6088611.0%
 
a6012610.9%
 
p5798410.5%
 
r342366.2%
 
v243804.4%
 
u202333.7%
 
l193243.5%
 
t180853.3%
 
h168183.0%
 
o167863.0%
 
k166073.0%
 
c127962.3%
 
m120722.2%
 
w114972.1%
 
b89531.6%
 
s76861.4%
 
f38430.7%
 
i38430.7%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
65670100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin55191989.4%
 
Common6567010.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e7909814.3%
 
n6666612.1%
 
y6088611.0%
 
a6012610.9%
 
p5798410.5%
 
r342366.2%
 
v243804.4%
 
u202333.7%
 
l193243.5%
 
t180853.3%
 
h168183.0%
 
o167863.0%
 
k166073.0%
 
c127962.3%
 
m120722.2%
 
w114972.1%
 
b89531.6%
 
s76861.4%
 
f38430.7%
 
i38430.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
65670100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII617589100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e7909812.8%
 
n6666610.8%
 
6567010.6%
 
y608869.9%
 
a601269.7%
 
p579849.4%
 
r342365.5%
 
v243803.9%
 
u202333.3%
 
l193243.1%
 
t180852.9%
 
h168182.7%
 
o167862.7%
 
k166072.7%
 
c127962.1%
 
m120722.0%
 
w114971.9%
 
b89531.4%
 
s76861.2%
 
f38430.6%
 
i38430.6%
 

payment_type
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
never pay
24380
per bucket
8953
monthly
8229
unknown
7654
on failure
 
3843
Other values (2)
 
4529
ValueCountFrequency (%) 
never pay2438042.3%
 
per bucket895315.5%
 
monthly822914.3%
 
unknown765413.3%
 
on failure38436.7%
 
annually36266.3%
 
other9031.6%
 

Length

Max length10
Median length9
Mean length8.544905189
Min length5

Overview of Unicode Properties

Unique unicode characters20
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e7141214.5%
 
n6666613.5%
 
r380797.7%
 
371767.6%
 
y362357.4%
 
a354757.2%
 
p333336.8%
 
v243805.0%
 
u240764.9%
 
o206294.2%
 
l193243.9%
 
t180853.7%
 
k166073.4%
 
h91321.9%
 
b89531.8%
 
c89531.8%
 
m82291.7%
 
w76541.6%
 
f38430.8%
 
i38430.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter45490892.4%
 
Space Separator371767.6%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e7141215.7%
 
n6666614.7%
 
r380798.4%
 
y362358.0%
 
a354757.8%
 
p333337.3%
 
v243805.4%
 
u240765.3%
 
o206294.5%
 
l193244.2%
 
t180854.0%
 
k166073.7%
 
h91322.0%
 
b89532.0%
 
c89532.0%
 
m82291.8%
 
w76541.7%
 
f38430.8%
 
i38430.8%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
37176100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin45490892.4%
 
Common371767.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e7141215.7%
 
n6666614.7%
 
r380798.4%
 
y362358.0%
 
a354757.8%
 
p333337.3%
 
v243805.4%
 
u240765.3%
 
o206294.5%
 
l193244.2%
 
t180854.0%
 
k166073.7%
 
h91322.0%
 
b89532.0%
 
c89532.0%
 
m82291.8%
 
w76541.7%
 
f38430.8%
 
i38430.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
37176100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII492084100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e7141214.5%
 
n6666613.5%
 
r380797.7%
 
371767.6%
 
y362357.4%
 
a354757.2%
 
p333336.8%
 
v243805.0%
 
u240764.9%
 
o206294.2%
 
l193243.9%
 
t180853.7%
 
k166073.4%
 
h91321.9%
 
b89531.8%
 
c89531.8%
 
m82291.7%
 
w76541.6%
 
f38430.8%
 
i38430.8%
 

water_quality
Categorical

HIGH CORRELATION

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
soft
49431
salty
 
4772
unknown
 
1661
milky
 
803
coloured
 
479
Other values (3)
 
442
ValueCountFrequency (%) 
soft4943185.8%
 
salty47728.3%
 
unknown16612.9%
 
milky8031.4%
 
coloured4790.8%
 
salty abandoned2280.4%
 
fluoride1990.3%
 
fluoride abandoned15< 0.1%
 

Length

Max length18
Median length4
Mean length4.277627283
Min length4

Overview of Unicode Properties

Unique unicode characters19
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
s5443122.1%
 
t5443122.1%
 
o5250721.3%
 
f4964520.2%
 
l64962.6%
 
y58032.4%
 
a54862.2%
 
n54692.2%
 
k24641.0%
 
u23541.0%
 
w16610.7%
 
d11790.5%
 
i10170.4%
 
e9360.4%
 
m8030.3%
 
r6930.3%
 
c4790.2%
 
2430.1%
 
b2430.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter24609799.9%
 
Space Separator2430.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
s5443122.1%
 
t5443122.1%
 
o5250721.3%
 
f4964520.2%
 
l64962.6%
 
y58032.4%
 
a54862.2%
 
n54692.2%
 
k24641.0%
 
u23541.0%
 
w16610.7%
 
d11790.5%
 
i10170.4%
 
e9360.4%
 
m8030.3%
 
r6930.3%
 
c4790.2%
 
b2430.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
243100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin24609799.9%
 
Common2430.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
s5443122.1%
 
t5443122.1%
 
o5250721.3%
 
f4964520.2%
 
l64962.6%
 
y58032.4%
 
a54862.2%
 
n54692.2%
 
k24641.0%
 
u23541.0%
 
w16610.7%
 
d11790.5%
 
i10170.4%
 
e9360.4%
 
m8030.3%
 
r6930.3%
 
c4790.2%
 
b2430.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
243100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII246340100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
s5443122.1%
 
t5443122.1%
 
o5250721.3%
 
f4964520.2%
 
l64962.6%
 
y58032.4%
 
a54862.2%
 
n54692.2%
 
k24641.0%
 
u23541.0%
 
w16610.7%
 
d11790.5%
 
i10170.4%
 
e9360.4%
 
m8030.3%
 
r6930.3%
 
c4790.2%
 
2430.1%
 
b2430.1%
 

quality_group
Categorical

HIGH CORRELATION

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
good
49431
salty
 
5000
unknown
 
1661
milky
 
803
colored
 
479
ValueCountFrequency (%) 
good4943185.8%
 
salty50008.7%
 
unknown16612.9%
 
milky8031.4%
 
colored4790.8%
 
fluoride2140.4%
 

Length

Max length8
Median length4
Mean length4.227113287
Min length4

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
o10169541.8%
 
d5012420.6%
 
g4943120.3%
 
l64962.7%
 
y58032.4%
 
s50002.1%
 
a50002.1%
 
t50002.1%
 
n49832.0%
 
k24641.0%
 
u18750.8%
 
w16610.7%
 
i10170.4%
 
m8030.3%
 
r6930.3%
 
e6930.3%
 
c4790.2%
 
f2140.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter243431100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
o10169541.8%
 
d5012420.6%
 
g4943120.3%
 
l64962.7%
 
y58032.4%
 
s50002.1%
 
a50002.1%
 
t50002.1%
 
n49832.0%
 
k24641.0%
 
u18750.8%
 
w16610.7%
 
i10170.4%
 
m8030.3%
 
r6930.3%
 
e6930.3%
 
c4790.2%
 
f2140.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin243431100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
o10169541.8%
 
d5012420.6%
 
g4943120.3%
 
l64962.7%
 
y58032.4%
 
s50002.1%
 
a50002.1%
 
t50002.1%
 
n49832.0%
 
k24641.0%
 
u18750.8%
 
w16610.7%
 
i10170.4%
 
m8030.3%
 
r6930.3%
 
e6930.3%
 
c4790.2%
 
f2140.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII243431100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
o10169541.8%
 
d5012420.6%
 
g4943120.3%
 
l64962.7%
 
y58032.4%
 
s50002.1%
 
a50002.1%
 
t50002.1%
 
n49832.0%
 
k24641.0%
 
u18750.8%
 
w16610.7%
 
i10170.4%
 
m8030.3%
 
r6930.3%
 
e6930.3%
 
c4790.2%
 
f2140.1%
 

quantity
Categorical

HIGH CORRELATION

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
enough
32260
insufficient
14564
dry
 
5990
seasonal
 
4001
unknown
 
773
ValueCountFrequency (%) 
enough3226056.0%
 
insufficient1456425.3%
 
dry599010.4%
 
seasonal40016.9%
 
unknown7731.3%
 

Length

Max length12
Median length6
Mean length7.357730777
Min length3

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter423717100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin423717100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII423717100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

quantity_group
Categorical

HIGH CORRELATION

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
enough
32260
insufficient
14564
dry
 
5990
seasonal
 
4001
unknown
 
773
ValueCountFrequency (%) 
enough3226056.0%
 
insufficient1456425.3%
 
dry599010.4%
 
seasonal40016.9%
 
unknown7731.3%
 

Length

Max length12
Median length6
Mean length7.357730777
Min length3

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter423717100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin423717100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII423717100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n6770816.0%
 
e5082512.0%
 
u4759711.2%
 
i4369210.3%
 
o370348.7%
 
g322607.6%
 
h322607.6%
 
f291286.9%
 
s225665.3%
 
c145643.4%
 
t145643.4%
 
a80021.9%
 
d59901.4%
 
r59901.4%
 
y59901.4%
 
l40010.9%
 
k7730.2%
 
w7730.2%
 

source
Categorical

HIGH CORRELATION

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
spring
17006
shallow well
15499
machine dbh
10826
river
9612
rainwater harvesting
 
2218
Other values (5)
 
2427
ValueCountFrequency (%) 
spring1700629.5%
 
shallow well1549926.9%
 
machine dbh1082618.8%
 
river961216.7%
 
rainwater harvesting22183.9%
 
hand dtw8731.5%
 
dam6491.1%
 
lake6391.1%
 
other2020.4%
 
unknown640.1%
 

Length

Max length20
Median length8
Mean length8.898989373
Min length3

Overview of Unicode Properties

Unique unicode characters21
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
l6263512.2%
 
r430868.4%
 
i418808.2%
 
e412148.0%
 
h404447.9%
 
a351406.9%
 
s347236.8%
 
w341536.7%
 
n333336.5%
 
294165.7%
 
g192243.8%
 
p170063.3%
 
o157653.1%
 
d132212.6%
 
v118302.3%
 
m114752.2%
 
c108262.1%
 
b108262.1%
 
t55111.1%
 
k7030.1%
 
u64< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter48305994.3%
 
Space Separator294165.7%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
l6263513.0%
 
r430868.9%
 
i418808.7%
 
e412148.5%
 
h404448.4%
 
a351407.3%
 
s347237.2%
 
w341537.1%
 
n333336.9%
 
g192244.0%
 
p170063.5%
 
o157653.3%
 
d132212.7%
 
v118302.4%
 
m114752.4%
 
c108262.2%
 
b108262.2%
 
t55111.1%
 
k7030.1%
 
u64< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
29416100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin48305994.3%
 
Common294165.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
l6263513.0%
 
r430868.9%
 
i418808.7%
 
e412148.5%
 
h404448.4%
 
a351407.3%
 
s347237.2%
 
w341537.1%
 
n333336.9%
 
g192244.0%
 
p170063.5%
 
o157653.3%
 
d132212.7%
 
v118302.4%
 
m114752.4%
 
c108262.2%
 
b108262.2%
 
t55111.1%
 
k7030.1%
 
u64< 0.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
29416100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII512475100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
l6263512.2%
 
r430868.4%
 
i418808.2%
 
e412148.0%
 
h404447.9%
 
a351406.9%
 
s347236.8%
 
w341536.7%
 
n333336.5%
 
294165.7%
 
g192243.8%
 
p170063.3%
 
o157653.1%
 
d132212.6%
 
v118302.3%
 
m114752.2%
 
c108262.1%
 
b108262.1%
 
t55111.1%
 
k7030.1%
 
u64< 0.1%
 

source_type
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
spring
17006
shallow well
15499
borehole
11699
river/lake
10251
rainwater harvesting
 
2218
Other values (2)
 
915
ValueCountFrequency (%) 
spring1700629.5%
 
shallow well1549926.9%
 
borehole1169920.3%
 
river/lake1025117.8%
 
rainwater harvesting22183.9%
 
dam6491.1%
 
other2660.5%
 

Length

Max length20
Median length8
Mean length9.233920261
Min length3

Overview of Unicode Properties

Unique unicode characters20
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
l8394615.8%
 
e6410112.1%
 
r5612710.6%
 
o391637.4%
 
s347236.5%
 
w332166.2%
 
a330536.2%
 
i316936.0%
 
h296825.6%
 
n214424.0%
 
g192243.6%
 
177173.3%
 
p170063.2%
 
v124692.3%
 
b116992.2%
 
/102511.9%
 
k102511.9%
 
t47020.9%
 
d6490.1%
 
m6490.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter50379594.7%
 
Space Separator177173.3%
 
Other Punctuation102511.9%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
l8394616.7%
 
e6410112.7%
 
r5612711.1%
 
o391637.8%
 
s347236.9%
 
w332166.6%
 
a330536.6%
 
i316936.3%
 
h296825.9%
 
n214424.3%
 
g192243.8%
 
p170063.4%
 
v124692.5%
 
b116992.3%
 
k102512.0%
 
t47020.9%
 
d6490.1%
 
m6490.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
17717100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/10251100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin50379594.7%
 
Common279685.3%
 

Most frequent Latin characters

ValueCountFrequency (%) 
l8394616.7%
 
e6410112.7%
 
r5612711.1%
 
o391637.8%
 
s347236.9%
 
w332166.6%
 
a330536.6%
 
i316936.3%
 
h296825.9%
 
n214424.3%
 
g192243.8%
 
p170063.4%
 
v124692.5%
 
b116992.3%
 
k102512.0%
 
t47020.9%
 
d6490.1%
 
m6490.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
1771763.3%
 
/1025136.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII531763100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
l8394615.8%
 
e6410112.1%
 
r5612710.6%
 
o391637.4%
 
s347236.5%
 
w332166.2%
 
a330536.2%
 
i316936.0%
 
h296825.6%
 
n214424.0%
 
g192243.6%
 
177173.3%
 
p170063.2%
 
v124692.3%
 
b116992.2%
 
/102511.9%
 
k102511.9%
 
t47020.9%
 
d6490.1%
 
m6490.1%
 

source_class
Categorical

HIGH CORRELATION

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
groundwater
44204
surface
13118
unknown
 
266
ValueCountFrequency (%) 
groundwater4420476.8%
 
surface1311822.8%
 
unknown2660.5%
 

Length

Max length11
Median length11
Mean length10.07036188
Min length7

Overview of Unicode Properties

Unique unicode characters14
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
r10152617.5%
 
u575889.9%
 
a573229.9%
 
e573229.9%
 
n450027.8%
 
o444707.7%
 
w444707.7%
 
g442047.6%
 
d442047.6%
 
t442047.6%
 
s131182.3%
 
f131182.3%
 
c131182.3%
 
k266< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter579932100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
r10152617.5%
 
u575889.9%
 
a573229.9%
 
e573229.9%
 
n450027.8%
 
o444707.7%
 
w444707.7%
 
g442047.6%
 
d442047.6%
 
t442047.6%
 
s131182.3%
 
f131182.3%
 
c131182.3%
 
k266< 0.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin579932100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
r10152617.5%
 
u575889.9%
 
a573229.9%
 
e573229.9%
 
n450027.8%
 
o444707.7%
 
w444707.7%
 
g442047.6%
 
d442047.6%
 
t442047.6%
 
s131182.3%
 
f131182.3%
 
c131182.3%
 
k266< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII579932100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
r10152617.5%
 
u575889.9%
 
a573229.9%
 
e573229.9%
 
n450027.8%
 
o444707.7%
 
w444707.7%
 
g442047.6%
 
d442047.6%
 
t442047.6%
 
s131182.3%
 
f131182.3%
 
c131182.3%
 
k266< 0.1%
 

waterpoint_type
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
communal standpipe
28375
hand pump
16181
other
6167
communal standpipe multiple
5959
improved spring
 
783
Other values (2)
 
123
ValueCountFrequency (%) 
communal standpipe2837549.3%
 
hand pump1618128.1%
 
other616710.7%
 
communal standpipe multiple595910.3%
 
improved spring7831.4%
 
cattle trough1160.2%
 
dam7< 0.1%
 

Length

Max length27
Median length18
Mean length14.95764743
Min length3

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
p10855512.6%
 
m9159810.6%
 
n856329.9%
 
a849729.9%
 
573736.7%
 
u565906.6%
 
d513056.0%
 
e473595.5%
 
t468085.4%
 
l463685.4%
 
i418594.9%
 
o414004.8%
 
s351174.1%
 
c344504.0%
 
h224642.6%
 
r78490.9%
 
g8990.1%
 
v7830.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter80400893.3%
 
Space Separator573736.7%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
p10855513.5%
 
m9159811.4%
 
n8563210.7%
 
a8497210.6%
 
u565907.0%
 
d513056.4%
 
e473595.9%
 
t468085.8%
 
l463685.8%
 
i418595.2%
 
o414005.1%
 
s351174.4%
 
c344504.3%
 
h224642.8%
 
r78491.0%
 
g8990.1%
 
v7830.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
57373100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin80400893.3%
 
Common573736.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
p10855513.5%
 
m9159811.4%
 
n8563210.7%
 
a8497210.6%
 
u565907.0%
 
d513056.4%
 
e473595.9%
 
t468085.8%
 
l463685.8%
 
i418595.2%
 
o414005.1%
 
s351174.4%
 
c344504.3%
 
h224642.8%
 
r78491.0%
 
g8990.1%
 
v7830.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
57373100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII861381100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
p10855512.6%
 
m9159810.6%
 
n856329.9%
 
a849729.9%
 
573736.7%
 
u565906.6%
 
d513056.0%
 
e473595.5%
 
t468085.4%
 
l463685.4%
 
i418594.9%
 
o414004.8%
 
s351174.1%
 
c344504.0%
 
h224642.6%
 
r78490.9%
 
g8990.1%
 
v7830.1%
 

waterpoint_type_group
Categorical

HIGH CORRELATION

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
communal standpipe
34334
hand pump
16181
other
 
6167
improved spring
 
783
cattle trough
 
116
ValueCountFrequency (%) 
communal standpipe3433459.6%
 
hand pump1618128.1%
 
other616710.7%
 
improved spring7831.4%
 
cattle trough1160.2%
 
dam7< 0.1%
 

Length

Max length18
Median length18
Mean length14.02635966
Min length3

Overview of Unicode Properties

Unique unicode characters18
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
p10259612.7%
 
m8563910.6%
 
n8563210.6%
 
a8497210.5%
 
514146.4%
 
d513056.4%
 
u506316.3%
 
o414005.1%
 
e414005.1%
 
t408495.1%
 
i359004.4%
 
s351174.3%
 
c344504.3%
 
l344504.3%
 
h224642.8%
 
r78491.0%
 
g8990.1%
 
v7830.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter75633693.6%
 
Space Separator514146.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
p10259613.6%
 
m8563911.3%
 
n8563211.3%
 
a8497211.2%
 
d513056.8%
 
u506316.7%
 
o414005.5%
 
e414005.5%
 
t408495.4%
 
i359004.7%
 
s351174.6%
 
c344504.6%
 
l344504.6%
 
h224643.0%
 
r78491.0%
 
g8990.1%
 
v7830.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
51414100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin75633693.6%
 
Common514146.4%
 

Most frequent Latin characters

ValueCountFrequency (%) 
p10259613.6%
 
m8563911.3%
 
n8563211.3%
 
a8497211.2%
 
d513056.8%
 
u506316.7%
 
o414005.5%
 
e414005.5%
 
t408495.4%
 
i359004.7%
 
s351174.6%
 
c344504.6%
 
l344504.6%
 
h224643.0%
 
r78491.0%
 
g8990.1%
 
v7830.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
51414100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII807750100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
p10259612.7%
 
m8563910.6%
 
n8563210.6%
 
a8497210.5%
 
514146.4%
 
d513056.4%
 
u506316.3%
 
o414005.1%
 
e414005.1%
 
t408495.1%
 
i359004.4%
 
s351174.3%
 
c344504.3%
 
l344504.3%
 
h224642.8%
 
r78491.0%
 
g8990.1%
 
v7830.1%
 

status_group
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
functional
31389
non functional
22268
functional needs repair
 
3931
ValueCountFrequency (%) 
functional3138954.5%
 
non functional2226838.7%
 
functional needs repair39316.8%
 

Length

Max length23
Median length10
Mean length12.43410085
Min length10

Overview of Unicode Properties

Unique unicode characters15
Unique unicode categories (?)2
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
n16364322.9%
 
o7985611.2%
 
i615198.6%
 
a615198.6%
 
f575888.0%
 
u575888.0%
 
c575888.0%
 
t575888.0%
 
l575888.0%
 
301304.2%
 
e117931.6%
 
r78621.1%
 
d39310.5%
 
s39310.5%
 
p39310.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter68592595.8%
 
Space Separator301304.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
n16364323.9%
 
o7985611.6%
 
i615199.0%
 
a615199.0%
 
f575888.4%
 
u575888.4%
 
c575888.4%
 
t575888.4%
 
l575888.4%
 
e117931.7%
 
r78621.1%
 
d39310.6%
 
s39310.6%
 
p39310.6%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
30130100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin68592595.8%
 
Common301304.2%
 

Most frequent Latin characters

ValueCountFrequency (%) 
n16364323.9%
 
o7985611.6%
 
i615199.0%
 
a615199.0%
 
f575888.4%
 
u575888.4%
 
c575888.4%
 
t575888.4%
 
l575888.4%
 
e117931.7%
 
r78621.1%
 
d39310.6%
 
s39310.6%
 
p39310.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
30130100.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII716055100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
n16364322.9%
 
o7985611.2%
 
i615198.6%
 
a615198.6%
 
f575888.0%
 
u575888.0%
 
c575888.0%
 
t575888.0%
 
l575888.0%
 
301304.2%
 
e117931.6%
 
r78621.1%
 
d39310.5%
 
s39310.5%
 
p39310.5%
 

geometry
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count57519
Unique (%)99.9%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
POINT (37.53051463 -6.96356538)
 
2
POINT (33.00627548 -2.51995041)
 
2
POINT (32.91986139 -2.47667983)
 
2
POINT (32.9780624 -2.51532072)
 
2
POINT (39.09837398 -6.98360619)
 
2
Other values (57514)
57578
ValueCountFrequency (%) 
POINT (37.53051463 -6.96356538)2< 0.1%
 
POINT (33.00627548 -2.51995041)2< 0.1%
 
POINT (32.91986139 -2.47667983)2< 0.1%
 
POINT (32.9780624 -2.51532072)2< 0.1%
 
POINT (39.09837398 -6.98360619)2< 0.1%
 
POINT (37.25011096 -7.10462503)2< 0.1%
 
POINT (32.95652279 -2.4943533)2< 0.1%
 
POINT (39.08628657 -6.99073094)2< 0.1%
 
POINT (32.98856004 -2.48937845)2< 0.1%
 
POINT (39.09906887 -6.98012199)2< 0.1%
 
POINT (37.53277831 -6.96247516)2< 0.1%
 
POINT (37.32890522 -7.17517443)2< 0.1%
 
POINT (39.09138014 -6.97832237)2< 0.1%
 
POINT (37.5433506 -6.96355665)2< 0.1%
 
POINT (32.92601185 -2.46390984)2< 0.1%
 
POINT (32.96573445 -2.5042939)2< 0.1%
 
POINT (39.08596496 -6.99129411)2< 0.1%
 
POINT (39.11921037 -6.99470401)2< 0.1%
 
POINT (32.98478963 -2.49645868)2< 0.1%
 
POINT (37.27435243 -7.10200368)2< 0.1%
 
POINT (37.37401655 -7.05692253)2< 0.1%
 
POINT (39.09206155 -6.98188419)2< 0.1%
 
POINT (37.33981057 -7.06537264)2< 0.1%
 
POINT (39.09851362 -6.980220399999999)2< 0.1%
 
POINT (32.95559708 -2.50162744)2< 0.1%
 
Other values (57494)5753899.9%
 

Length

Max length44
Median length31
Mean length31.75244843
Min length25

Overview of Unicode Properties

Unique unicode characters20
Unique unicode categories (?)7
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
31686099.2%
 
91287437.0%
 
1151766.3%
 
.1151766.3%
 
01089226.0%
 
41078985.9%
 
11077065.9%
 
61054325.8%
 
81038835.7%
 
71028915.6%
 
51020495.6%
 
21013715.5%
 
P575883.1%
 
O575883.1%
 
I575883.1%
 
N575883.1%
 
T575883.1%
 
(575883.1%
 
-575883.1%
 
)575883.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number113750462.2%
 
Uppercase Letter28794015.7%
 
Space Separator1151766.3%
 
Other Punctuation1151766.3%
 
Open Punctuation575883.1%
 
Dash Punctuation575883.1%
 
Close Punctuation575883.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P5758820.0%
 
O5758820.0%
 
I5758820.0%
 
N5758820.0%
 
T5758820.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
115176100.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(57588100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
316860914.8%
 
912874311.3%
 
01089229.6%
 
41078989.5%
 
11077069.5%
 
61054329.3%
 
81038839.1%
 
71028919.0%
 
51020499.0%
 
21013718.9%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.115176100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-57588100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)57588100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common154062084.3%
 
Latin28794015.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
P5758820.0%
 
O5758820.0%
 
I5758820.0%
 
N5758820.0%
 
T5758820.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
316860910.9%
 
91287438.4%
 
1151767.5%
 
.1151767.5%
 
01089227.1%
 
41078987.0%
 
11077067.0%
 
61054326.8%
 
81038836.7%
 
71028916.7%
 
51020496.6%
 
21013716.6%
 
(575883.7%
 
-575883.7%
 
)575883.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1828560100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
31686099.2%
 
91287437.0%
 
1151766.3%
 
.1151766.3%
 
01089226.0%
 
41078985.9%
 
11077065.9%
 
61054325.8%
 
81038835.7%
 
71028915.6%
 
51020495.6%
 
21013715.5%
 
P575883.1%
 
O575883.1%
 
I575883.1%
 
N575883.1%
 
T575883.1%
 
(575883.1%
 
-575883.1%
 
)575883.1%
 

x
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count57515
Unique (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.149669123888835
Minimum29.6071219
Maximum40.34519307
Zeros0
Zeros (%)0.0%
Memory size450.0 KiB

Quantile statistics

Minimum29.6071219
5-th percentile30.62360773
Q133.28510016
median35.00594322
Q337.23371212
95-th percentile39.15049865
Maximum40.34519307
Range10.73807117
Interquartile range (IQR)3.94861196

Descriptive statistics

Standard deviation2.60742797
Coefficient of variation (CV)0.07418072587
Kurtosis-0.8692761515
Mean35.14966912
Median Absolute Deviation (MAD)1.979294605
Skewness-0.1348112926
Sum2024199.146
Variance6.798680617
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
33.090347382< 0.1%
 
39.086286572< 0.1%
 
39.093095442< 0.1%
 
39.098513622< 0.1%
 
37.543401452< 0.1%
 
32.988560042< 0.1%
 
32.956522792< 0.1%
 
32.987670482< 0.1%
 
32.967009262< 0.1%
 
32.993276842< 0.1%
 
39.085964962< 0.1%
 
37.534327342< 0.1%
 
31.619529532< 0.1%
 
39.095684162< 0.1%
 
39.086182572< 0.1%
 
37.252194462< 0.1%
 
32.965734452< 0.1%
 
37.375716872< 0.1%
 
37.318911282< 0.1%
 
37.374016552< 0.1%
 
32.982698062< 0.1%
 
37.540900642< 0.1%
 
39.088875132< 0.1%
 
38.340501342< 0.1%
 
39.119210372< 0.1%
 
Other values (57490)5753899.9%
 
ValueCountFrequency (%) 
29.60712191< 0.1%
 
29.607201091< 0.1%
 
29.610320561< 0.1%
 
29.610964821< 0.1%
 
29.611946741< 0.1%
 
29.612506891< 0.1%
 
29.612762961< 0.1%
 
29.613443091< 0.1%
 
29.61687181< 0.1%
 
29.618479191< 0.1%
 
ValueCountFrequency (%) 
40.345193071< 0.1%
 
40.344300891< 0.1%
 
40.325239961< 0.1%
 
40.325226431< 0.1%
 
40.323401811< 0.1%
 
40.322832371< 0.1%
 
40.322804531< 0.1%
 
40.32262511< 0.1%
 
40.322169021< 0.1%
 
40.321965931< 0.1%
 

y
Real number (ℝ)

HIGH CORRELATION

Distinct count57516
Unique (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.885572340514864
Minimum-11.64944018
Maximum-0.99846435
Zeros0
Zeros (%)0.0%
Memory size450.0 KiB

Quantile statistics

Minimum-11.64944018
5-th percentile-10.60147827
Q1-8.643840785
median-5.17270373
Q3-3.372824195
95-th percentile-1.802689797
Maximum-0.99846435
Range10.65097583
Interquartile range (IQR)5.27101659

Descriptive statistics

Standard deviation2.809876457
Coefficient of variation (CV)-0.477417708
Kurtosis-1.203165882
Mean-5.885572341
Median Absolute Deviation (MAD)2.041399535
Skewness-0.2522877584
Sum-338938.3399
Variance7.895405705
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-6.976270112< 0.1%
 
-6.985841732< 0.1%
 
-7.056922532< 0.1%
 
-6.97875552< 0.1%
 
-6.959748732< 0.1%
 
-6.963556652< 0.1%
 
-2.463909842< 0.1%
 
-7.103742322< 0.1%
 
-6.983182632< 0.1%
 
-2.519950412< 0.1%
 
-2.528715732< 0.1%
 
-6.98022042< 0.1%
 
-6.989456222< 0.1%
 
-2.506589542< 0.1%
 
-6.956745642< 0.1%
 
-7.104625032< 0.1%
 
-2.516619392< 0.1%
 
-2.494545592< 0.1%
 
-6.962475162< 0.1%
 
-6.983115122< 0.1%
 
-2.496458682< 0.1%
 
-9.28934922< 0.1%
 
-2.515320722< 0.1%
 
-6.990548642< 0.1%
 
-6.96425762< 0.1%
 
Other values (57491)5753899.9%
 
ValueCountFrequency (%) 
-11.649440181< 0.1%
 
-11.648377591< 0.1%
 
-11.586296561< 0.1%
 
-11.568576791< 0.1%
 
-11.566804571< 0.1%
 
-11.564508651< 0.1%
 
-11.564323571< 0.1%
 
-11.562315921< 0.1%
 
-11.562288981< 0.1%
 
-11.561618981< 0.1%
 
ValueCountFrequency (%) 
-0.998464351< 0.1%
 
-0.9989161< 0.1%
 
-0.999012091< 0.1%
 
-0.999117021< 0.1%
 
-0.99946921< 0.1%
 
-0.999506511< 0.1%
 
-0.999522321< 0.1%
 
-1.000585191< 0.1%
 
-1.00152081< 0.1%
 
-1.001987841< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Unnamed: 0idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_groupgeometryxy
00695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctionalPOINT (34.93809275 -9.856321769999999)34.938093-9.856322
1187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280NaNGeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctionalPOINT (34.6987661 -2.14746569)34.698766-2.147466
223431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipefunctionalPOINT (37.46066446 -3.82132853)37.460664-3.821329
33677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipenon functionalPOINT (38.48616088 -11.15529772)38.486161-11.155298
44197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipefunctionalPOINT (31.13084671 -1.82535885)31.130847-1.825359
55994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipefunctionalPOINT (39.1727956 -4.76558728)39.172796-4.765587
66198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpnon functionalPOINT (33.36240982 -3.76636472)33.362410-3.766365
77545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpnon functionalPOINT (32.62061707 -4.22619802)32.620617-4.226198
88539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pumpnon functionalPOINT (32.71110001 -5.14671181)32.711100-5.146712
99461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctionalPOINT (30.62699053 -1.25705061)30.626991-1.257051

Last rows

Unnamed: 0idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_groupstatus_groupgeometryxy
5757859390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pumpfunctionalPOINT (31.37084807 -8.25816008)31.370848-8.258160
5757959391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipenon functionalPOINT (38.04406992 -4.27221758)38.044070-4.272218
5758059392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipenon functionalPOINT (33.00944043 -8.52088818)33.009440-8.520888
5758159393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherotherfunctionalPOINT (33.86685217 -4.28740983)33.866852-4.287410
575825939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipenon functionalPOINT (37.63405278 -6.12482968)37.634053-6.124830
57583593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipefunctionalPOINT (37.16980689 -3.25384746)37.169807-3.253847
5758459396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipefunctionalPOINT (35.24999126 -9.0706288)35.249991-9.070629
5758559397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pumpfunctionalPOINT (34.01708706 -8.750434329999999)34.017087-8.750434
5758659398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pumpfunctionalPOINT (35.86131531 -6.37857327)35.861315-6.378573
5758759399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pumpfunctionalPOINT (38.10404822 -6.74746425)38.104048-6.747464